and multihomed systems—that is,
those with multiple network interfaces. Many people confuse increasing
network bandwidth with higher performance, but increasing bandwidth
does not necessarily reduce latency.
The challenge for the sockets API is
giving the application faster access to
network data.
The way in which any program using the sockets API sends and receives
data is via calls to the operating system. All of these calls have one thing
in common: the calling program must
repeatedly ask for data to be delivered.
In a world of client/server computing
these constant requests make perfect
sense, because the server cannot do
anything without a request from the
client. It makes little sense for a print
server to call a client unless the client
has something it wishes to print. What,
however, if the service provided is music or video distribution? In a media
distribution service there may be one
or more sources of data and many listeners. For as long as the user is listening to or viewing the media, the most
likely case is that the application will
want whatever data has arrived. Specifically requesting new data is a waste
of time and resources for the application. The sockets API does not provide
the programmer a way in which to say,
“Whenever there is data for me, call me
to process it directly.”
Sockets programs are instead written
from the viewpoint of a dearth of, rather
than a wealth of, data. Network programs are so used to waiting on data that
they use a separate system call, socket(), so that they can listen to multiple
sources of data without blocking on a
single request. The typical processing
loop of a sockets-based program isn’t
simply read(), process(), read(), but
instead select(), read(), process(),
select(). Although the addition of a
single system call to a loop would not
seem to add much of a burden, this is
not the case. Each system call requires
arguments to be marshaled and copied into the kernel, as well as causing
the system to block the calling process
and schedule another. If there were data
available to the caller when it invoked
select(), then all of the work that went
into crossing the user/kernel boundary
was wasted because a read() would
have returned data immediately. The
sockets programs
are written from
the viewpoint
of a dearth of,
rather than
a wealth of, data.
constant check/read/check is wasteful
unless the time between successive requests is quite long.
Solving this problem requires inverting the communication model between an application and the operating
system. Various attempts to provide an
API that allows the kernel to call directly
into a program have been proposed but
none has gained wide acceptance—for
a few reasons. The operating systems
that existed at the time the sockets API
was developed were, except in very esoteric circumstances, single threaded
and executed on single-processor computers. If the kernel had been fitted
with an up-call API, there would have
been the problem of which context the
call could have executed in. Having all
other work on a system pause because
the kernel was executing an up-call into
an application would have been unacceptable, particularly in timesharing
systems with tens to hundreds of users.
The only place in which such software
architecture did gain currency was in
embedded systems and networked
routers where there were no users and
no virtual memory.
The issue of virtual memory compounds the problems of implementing a kernel up-call mechanism. The
memory allocated to a user process is
virtual memory, but the memory used
by devices such as network interfaces
is physical. Having the kernel map
physical memory from a device into a
user-space program breaks one of the
fundamental protections provided by a
virtual memory system.
attempts to Overcome
Performance issues
A couple of different mechanisms
have been proposed and sometimes
implemented on various operating
systems to overcome the performance
issues present in the sockets API. One
such mechanism is zero-copy sockets.
Anyone who has worked on a network
stack knows that copying data is what
kills the performance of networking
protocols. Therefore, to improve the
speed of networked applications that
are more interested in high bandwidth
than in low latency, the operating system is modified to remove as many data
copies as possible. Traditionally, an
operating system performs two copies
for each packet received by the system.