connecting racks of servers organized
as multiple rows on the data-center
floor. The data-center network topology and resulting cable complexity
is “baked in” and remains a rigid fixture of the cluster. Managing cable
complexity is nontrivial, which is immediately evident from the intricately
woven tapestry of fiber-optic cabling
laced throughout the data center. It
is not uncommon to run additional
fiber for redundancy, in the event of a
cable failure in a “bundle” of fiber or
for planned bandwidth growth. Fiber
cables are carefully measured to allow
some slack and to satisfy the cable’s
bend radius, and they are meticulously
labeled to make troubleshooting less
of a needle-in-a-haystack exercise.
Reliable and Available
Abstraction is the Archimedes lever
that lifts many disciplines within computer science and is used extensively
in both computer system design and
software engineering. Like an array
of nested Russian dolls, the network-programming model provides abstraction between successive layers of the
networking stack, enabling platform-independent access to both data and
system management. One such example of this type of abstraction is
the protocol buffer,
21 which provides a
structured message-passing interface
for Web applications written in C++,
Java, or Python.
Perhaps the most common abstraction used in networking is the notion
of a communication channel as a virtual
resource connecting two hosts. The
TCP communication model provides
this abstraction to the programmer in
the form of a file descriptor, for example, where reads and writes performed
on the socket result in the corresponding network transactions, which are
hidden from the user application. In
much the same way, the InfiniBand QP
(queue-pair) verb model provides an
abstraction for the underlying send/
receive hardware queues in the NIC.
Besides providing a more intuitive programming interface, abstraction also
serves as a protective sheath around
software when faults arise, depositing
layers of software sediment to insulate it from critical faults (for example,
memory corruption or, worse, host
power-supply failure).
The data-center
network serves
as a “central
nervous system”
for information
exchange between
cooperating tasks.
Bad things happen to good software. Web applications must be designed to be fault aware and, to the extent possible, resilient in the presence
of a variety of failure scenarios.
10 The
network is responsible for the majority of the unavailability budget for a
modern cluster. Whether it is a rogue
gamma ray causing a soft error in
memory or an inattentive worker accidentally unearthing a fiber-optic line,
the operating system and underlying
hardware substrate work in concert to
foster a robust ecosystem for Web applications.
The data-center network serves as
a “central nervous system” for information exchange between cooperating tasks. The network’s functionality
is commonly divided into control and
data planes. The control plane provides an ancillary network juxtaposed
with the data network and tasked with
“command and control” for the primary data plane. The control plane is
an autonomic system for configuration, fault detection and repair, and
monitoring of the data plane. The
control plane is typically implemented as an embedded system within
each switch component and is tasked
with fault detection, notification, and
repair when possible.
For example, when a network link
fails or has an uncharacteristically
high number of transmission errors,
the control plane will reroute the network to avoid the faulty link. This
entails recomputing the routes according to the routing algorithm and
emplacing new entries in the routing tables of the affected switches.
Of course, the effects of this patchwork are not instantaneous. Once
the routing algorithm computes new
routes, taking into consideration the
newfound faulty links, it must disseminate the routes to the affected
switches. The time needed for this information exchange is referred to as
convergence time, and a primary goal
of the routing protocol is to ensure it
is optimally confined to a small epoch.
Fault recovery is a very complicated
subject and confounds all but the simplest of data-center networks. Among
the complicating factors are marginal
links that cause “flapping” by transitioning between active and inactive
(that is, up and down), repeatedly creat-