idea. It is, however, much harder to do in real life than it first appears on the whiteboard.

CON TROL VS. DATA

Simply moving data directly off the network wire into application buffers is not sufficient. The delivery of packets must be coordinated with everything else the application is doing and all the other operating-system machinery behind the scenes. Because of this, the network protocol stack interacts with the rest of the operating system in exquisitely delicate ways. Truth be told, this coordination machinery is the lion’s share of the code in most stack implementations. The actual TCP state machine fits on a half page, once divorced of all the glue and scaffolding needed to integrate it with the rest of the system environment. It is precisely this subtle and complex control coupling that makes it surprisingly difficult to isolate a network protocol stack fully from its host operating system. There are multiple reasons why this interaction is such a rich breeding ground for implementation bugs, but one vast category is “ abstraction mismatch.”

Because communications protocols inherently deal with multiple communicating entities, some assumptions must be made about the behavior of those entities. The degree to which those assumptions match between a host system and protocol code determines how difficult it will be to map to existing semantics and how much new structure and machinery will be required.

When networking first went into Berkeley Unix, subtleties on both sides required considerable effort to reconcile. There was a critical desire to make network connections appear to be natural extensions of existing Unix machinery: file descriptors, pipes, and the other ideas that make Unix conceptually compact. But because of radical differences in behavior, especially delay, it is impossible to completely disguise reading 1,000 bytes from a round-the-world network connection so that it appears indistinguishable from reading that same 1,000 bytes from a file on a local file system.

Networks have new behaviors that require new interfaces to capture and manage, but those new interfaces need to make sense with existing interfaces. This was hard work, and the modifications left few pieces of the system untouched; a few changed in profound ways.

BACK TO BASICS

The fundamental capabilities provided by a network protocol stack are data transfer, multiplexing, flow control, and error management. All of these functions are required

for the coordinated delivery of data between endpoints across the Internet. That is the purpose of all the structure in the packet headers: to carry the control coordination information, as well as the payload data.

The critical observation is that the exact same operations are required to coordinate the interaction of a network protocol stack and the host operating system within a single system. When all the code is in the same place (i.e., running on the same processor), this signaling is easily done with simple procedure calls. If, however, the network protocol stack executes on a remote processor such as a TOE, this signaling must be done with an explicit protocol carried across whatever connects the front-end processor to the host operating system. This protocol is called an HFEP (host-front end protocol).

Designing an HFEP is not trivial, especially if a goal is that it be materially simpler than the protocol being offloaded to the remote processor. Historically, the HFEP has been the Achilles’ heel of NFE processors. The HFEP ends up being asymptotically as complex as the “ primary” protocol being offloaded, so there is very little to gain in offloading it. In addition, the HFEP must be implemented twice: once in the host and once in the front-end processor, each one of those being a different host platform as far as the HFEP is concerned. Two implementations, two integrations with host operating systems—this means twice as many sources of subtle race conditions, deadlocks, buffer starvations, and other nasty bugs. This cost needs a huge payoff to cover it.

BUT WAI T A MINUTE

About now some readers may be itching to throw a penalty flag for “unconvincing hand waving,” because even in the base case there is a protocol between the Ethernet interface and the host computer device driver. “Doesn’t that count?” you rightfully ask. Yes, indeed, it does.

There is a long history of peripheral chips being designed with absolutely dreadful interfaces. Such chips have been known to make device-driver writers contemplate slow, painful violence if they ever meet the chip designer in a dark alley. The very early Ethernet chips from one famous semiconductor company were absolute masterpieces of egregious overdesign. Not only did they contain many complex functions of dubious utility, but also the functions that were genuinely required suffered from the same virulent infestation of bugs that plagued the useless bits. Tom Lyon wrote a famous Usenix paper in 1985, “All the Chips that Fit,” delivering an epic rant on this expansive topic. (It should be required reading for anyone contemplating hardware design.)

References:

http://queue.acm.org

Archives