table 2: aPis added by sCtP.
aPi explanation
sctp _ bindx() Bind or unbind an SCTP socket to a list of addresses sctp _ connectx() Connect an SCTP socket with multiple destination addresses sctp _ generic _ recvmsg() Receive data from a peer sctp _ generic _ sendmsg(),
Send data to a peer
sctp _ generic _ sendmsg _ iov() sctp _ getaddrlen()
Return the address length of an address family
sctp _ getassocid()
Return an association ID for a specified socket address
sctp _ getpaddrs(),
Return list of addresses to caller
sctp _ getladdrs() sctp _ peeloff()
sctp _ sendx() sctp _ sendmsgx()
Detach an association from a one-to-many socket to a separate file descriptor Send a message from an SCTP socket Send a message from an SCTP socket
The first copy is performed by the network driver from the network device’s memory into the kernel’s memory, and the second is performed by the sockets layer in the kernel when the data is read by the user program. Each of these copy operations is expensive because it must occur for each message that the system receives. Similarly, when the program wants to send a message, data must be copied from the user’s program into the kernel for each message sent; then that data will be copied into the buffers used by the device to transmit it on the network.
Most operating-system designers and developers know that data copying is anathema to system performance and work to minimize such copies within the kernel. The easiest way for the kernel to avoid a data copy is to have device drivers copy data directly into and out of kernel memory. On modern network devices this is a result of how they structure their memory. The driver and kernel share two rings of packet descriptors—one for transmit and one for receive—where each descriptor has a single pointer to memory. The network device driver initially fills these rings with memory from the kernel. When data is received, the device sets a flag in the correct receive descriptor and tells the kernel, usually via an interrupt, that there is data waiting for it. The kernel then removes the filled buffer from the receive descriptor ring and replaces it with a fresh buffer for the device to fill. The packet, in the form of the buffer, then moves through the network stack
until it reaches the socket layer, where it is copied out of the kernel when the user’s program calls read(). Data sent by the program is handled in a similar way by the kernel, in that kernel buffers are eventually added to the transmit descriptor ring and a flag is then set to tell the device that it can place the data in the buffer on the network.
All of this work in the kernel leaves the last copy problem unsolved, and several attempts have been made to extend the sockets API to remove this copy operation. 1, 3 The problem remains as to how memory can be safely shared across the user/kernel boundary. The kernel cannot give its memory over to the user program, because at that point it loses control over the memory. A user program that crashes may leave the kernel without a significant chunk of usable memory, leading to system performance degradation. There are also security issues inherent in sharing memory buffers across the kernel/user boundary. There is no single answer to how a user program might achieve higher bandwidth using the sockets API.
For programmers who are more concerned with latency than with bandwidth, even less has been done. The only significant improvement for programs that are waiting for a network event has been the addition of a set of kernel events that a program can wait on. Kernel events, or kevents(), are an extension of the select() mechanism to encompass any possible event that the kernel might be able to tell the program about. Before the advent of kevents, a user program could call
select() on any file descriptor, which would let the program know when any of a set of file descriptors was readable, writable, or had an error. When programs were written to sit in a loop and wait on a set of file descriptors—for example, reading from the network and writing to disk—the select() call was sufficient, but once a program wanted to check for other events, such as timers and signals, select() no longer served. The problem for low-latency apps is that kevents() do not deliver data; they deliver only a signal that data is ready, just as the select() call did. The next logical step would be to have an event-based API that also delivered data. There is no reason to have the application cross the user/kernel boundary twice simply to get the data the kernel knows the application wants.
The sockets API not only presents performance problems to the application writer, but also narrows the type of communication that can take place. The client/server paradigm is inherently a 1: 1 type of communication. Although a server may handle requests from a diverse group of clients, each client has only one connection to a single server for a request or set of requests. In a world in which each computer had only one network interface, that paradigm made perfect sense. A connection between a client and server is identified by a quad of <Source IP, Source Port, Destination IP, Destination Port>. Since services generally have a well-known destination port (for example, 80 for HTTP), the only value that can easily vary is the source port, since the IP addresses are fixed.
In the Internet of 1982 each machine that was not a router had only a single network interface, meaning that to identify a service, such as a remote printer, the client computer needed a single destination address and port and had, itself, only a single source address and port to work with. While it did exist, the idea that a computer might have multiple ways of reaching a service was too complicated and far too expensive to implement. Given these constraints, there was no reason for the sockets API to expose to the programmer the ability to write a multihomed program—one that could manage
References:
Archives