table 2: aPis added by sCtP.
aPi explanation
sctp _ bindx() Bind or unbind an SCTP socket to a list of addresses
sctp _ connectx() Connect an SCTP socket with multiple destination addresses
sctp _ generic _ recvmsg() Receive data from a peer
sctp _ generic _ sendmsg(),
Send data to a peer
sctp _ generic _ sendmsg _ iov()
sctp _ getaddrlen()
Return the address length of an address family
sctp _ getassocid()
Return an association ID for a specified socket address
sctp _ getpaddrs(),
Return list of addresses to caller
sctp _ getladdrs()
sctp _ peeloff()
sctp _ sendx()
sctp _ sendmsgx()
Detach an association from a one-to-many socket to
a separate file descriptor
Send a message from an SCTP socket
Send a message from an SCTP socket
The first copy is performed by the network driver from the network device’s
memory into the kernel’s memory, and
the second is performed by the sockets layer in the kernel when the data is
read by the user program. Each of these
copy operations is expensive because it
must occur for each message that the
system receives. Similarly, when the
program wants to send a message, data
must be copied from the user’s program into the kernel for each message
sent; then that data will be copied into
the buffers used by the device to transmit it on the network.
Most operating-system designers
and developers know that data copying
is anathema to system performance
and work to minimize such copies
within the kernel. The easiest way for
the kernel to avoid a data copy is to
have device drivers copy data directly
into and out of kernel memory. On
modern network devices this is a result of how they structure their memory. The driver and kernel share two
rings of packet descriptors—one for
transmit and one for receive—where
each descriptor has a single pointer
to memory. The network device driver
initially fills these rings with memory
from the kernel. When data is received, the device sets a flag in the correct receive descriptor and tells the
kernel, usually via an interrupt, that
there is data waiting for it. The kernel
then removes the filled buffer from the
receive descriptor ring and replaces it
with a fresh buffer for the device to fill.
The packet, in the form of the buffer,
then moves through the network stack
until it reaches the socket layer, where
it is copied out of the kernel when the
user’s program calls read(). Data sent
by the program is handled in a similar
way by the kernel, in that kernel buffers are eventually added to the transmit descriptor ring and a flag is then
set to tell the device that it can place
the data in the buffer on the network.
All of this work in the kernel leaves
the last copy problem unsolved, and
several attempts have been made to
extend the sockets API to remove this
copy operation.
1, 3 The problem remains as to how memory can be safely
shared across the user/kernel boundary. The kernel cannot give its memory
over to the user program, because at that
point it loses control over the memory.
A user program that crashes may leave
the kernel without a significant chunk
of usable memory, leading to system
performance degradation. There are
also security issues inherent in sharing
memory buffers across the kernel/user
boundary. There is no single answer to
how a user program might achieve higher bandwidth using the sockets API.
For programmers who are more concerned with latency than with bandwidth, even less has been done. The
only significant improvement for programs that are waiting for a network
event has been the addition of a set of
kernel events that a program can wait
on. Kernel events, or kevents(), are
an extension of the select() mechanism to encompass any possible event
that the kernel might be able to tell the
program about. Before the advent of
kevents, a user program could call
select() on any file descriptor, which
would let the program know when any
of a set of file descriptors was readable,
writable, or had an error. When programs were written to sit in a loop and
wait on a set of file descriptors—for example, reading from the network and
writing to disk—the select() call was
sufficient, but once a program wanted
to check for other events, such as timers and signals, select() no longer
served. The problem for low-latency
apps is that kevents() do not deliver
data; they deliver only a signal that data
is ready, just as the select() call did.
The next logical step would be to have
an event-based API that also delivered
data. There is no reason to have the application cross the user/kernel boundary twice simply to get the data the kernel knows the application wants.
Lack of support for multihoming
The sockets API not only presents performance problems to the application
writer, but also narrows the type of
communication that can take place.
The client/server paradigm is inherently a 1: 1 type of communication. Although a server may handle requests
from a diverse group of clients, each
client has only one connection to a
single server for a request or set of requests. In a world in which each computer had only one network interface,
that paradigm made perfect sense. A
connection between a client and server
is identified by a quad of <Source IP,
Source Port, Destination IP, Destination Port>. Since services generally
have a well-known destination port (for
example, 80 for HTTP), the only value
that can easily vary is the source port,
since the IP addresses are fixed.
In the Internet of 1982 each machine that was not a router had only a
single network interface, meaning that
to identify a service, such as a remote
printer, the client computer needed
a single destination address and port
and had, itself, only a single source
address and port to work with. While
it did exist, the idea that a computer
might have multiple ways of reaching a
service was too complicated and far too
expensive to implement. Given these
constraints, there was no reason for the
sockets API to expose to the programmer the ability to write a multihomed
program—one that could manage