are the edges that connect them. The
topology is central to both the performance and cost of the network. The
topology affects a number of design
trade-offs, including performance,
system packaging, path diversity, and
redundancy, which, in turn, affect the
network’s resilience to faults, average
and maximum cable length, and, of
course, cost.
12 The Cisco Data Center Infrastructure 3.0 Design Guide6 describes
common practices based on a tree-like
topology15 resembling early telephony
networks proposed by Charles Clos,
7
with bandwidth aggregation at different levels of the network.
A fat-tree or folded-Clos topology,
similar to that shown in Figure 2, has
an aggregate bandwidth that grows in
proportion to the number of host ports
in the system. A scalable network is
one in which increasing the number
of ports in the network should linearly
increase the delivered bisection bandwidth. Scalability and reliability are
inseparable since growing to large system size requires a robust network.
Network addressing. A host’s
address is how endpoints are identified
in the network. Endpoints are distinguished from intermediate switching elements traversed en route since
messages are created by and delivered
to an endpoint. In the simplest terms,
the address can be thought of as the
numerical equivalent of a host name
similar to that reported by the Unix
hostname command.
An address is unique and must be
represented in a canonical form that
can be used by the routing function to
determine where to route a packet. The
switch inspects the packet header corresponding to the layer in which routing is performed—for example, IP address from layer 3 or Ethernet address
from layer 2. Switching over Ethernet
involves ARP (address resolution protocol) and RARP (reverse address resolution protocol) that broadcast messages on the layer 2 network to update
local caches mapping layer 2 to layer
3 addresses and vice versa. Routing at
layer 3 requires each switch to maintain a subnet mask and assign IP addresses statically or disseminate host
addresses using DHCP (dynamic host
configuration protocol), for example.
The layer 2 routing tables are automatically populated when a switch is
The high-level
system architecture
and programming
model shape both
the programmer’s
conceptual view
and application
usage.
plugged in and learns its identity and
exchanges route information with its
peers; however, the capacity of the
packet-forwarding tables is limited to,
say, 64K entries. Further, each layer 2
switch will participate in an STP (
spanning tree protocol) or use the TRILL
(transparent interconnect of lots of
links) link-state protocol to exchange
routing information and avoid transient routing loops that may arise while
the link state is exchanged among
peers. Neither layer 2 nor layer 3 routing is perfectly suited to data-center
networks, so to overcome these limitations many new routing algorithms
have been proposed (for example, PortLand1, 18 and VL211).
Routing. The routing algorithm determines the path a packet traverses
through the network. A packet’s route,
or path, through the network can be
asserted when the message is created, called source routing, or may be
asserted hop by hop in a distributed
manner as a packet visits intermediate
switches. Source routing requires that
every endpoint know the prescribed
path to reach all other endpoints, and
each source-routed packet carries the
full information to determine the set of
port/link traversals from source to destination endpoint. As a result of this
overhead and inflexible fault handling,
source-routed packets are generally
used only for topology discovery and
network initialization, or during fault
recovery when the state of a switch is
unknown. A more flexible method of
routing uses distributed lookup tables
at each switch, as shown in Figure 3.
For example, consider a typical Ethernet switch. When a packet arrives at
a switch input port, it uses fields from
the packet header to index into a lookup table and determine the next hop,
or egress port, from the current switch.
A good topology will have abundant
path diversity in which multiple possible egress ports may exist, with each
one leading to a distinct path. Path diversity in the topology may yield ECMP
(equal-cost multipath) routing; in that
case the routing algorithm attempts to
load-balance the traffic flowing across
the links by spreading traffic uniformly.
To accomplish this uniform spreading,
the routing function in the switch will
hash several fields of the packet header
to produce a deterministic egress port.