ware should be able to assign any IP address the service
requests to any server, and virtual machines should be
able to migrate to any server while keeping the same IP
address. Finally, features like link-local broadcast, on
which many legacy applications depend, should work.
We design, implement, and evaluate VL2, a network
architecture for data centers that meets these three objectives and thereby achieves agility.
Design philosophy: In designing VL2, a primary goal
was to create a network architecture that could be deployed
today, so we limit ourselves from making any changes to the
hardware of the switches or servers, and we require that legacy applications work unmodified. Our approach is to build
a network that operates like a very large switch—choosing
simplicity and high performance over other features when
needed. We sought to use robust and time-tested control
plane protocols, and we avoid adaptive routing schemes that
might theoretically offer more bandwidth but open thorny
problems that might not need to be solved and would take
us away from vanilla, commodity, high-capacity switches.
We observe, however, that the software and operating systems on data center servers are already extensively modified
(e.g., to create hypervisors for virtualization or blob file systems to store data across servers). Therefore, VL2’s design
explores a new split in the responsibilities between host and
network—using a layer 2. 5 shim in servers’ network stack
to work around limitations of the network devices. No new
switch software or switch APIs are needed.
topology: VL2 consists of a network built from low-cost switch ASICs arranged into a Clos topology that provides extensive path diversity between servers. This design
replaces today’s mainframe-like large, expensive switches
with broad layers of low-cost switches that can be scaled out
to add more capacity and resilence to failure. In essence,
VL2 applies the principles of RAID (redundant arrays of
inexpensive disks) to the network.
traffic engineering: Our measurements show data centers have tremendous volatility in their workload, their traffic, and their failure patterns. To cope with this volatility
in the simplest manner, we adopt Valiant Load Balancing
(VLB) to spread traffic across all available paths without any
centralized coordination or traffic engineering. Using VLB,
each server independently picks a path at random through
the network for each of the flows it sends to other servers
in the data center. Our experiments verify that using this
design achieves both uniform high capacity and performance isolation.
Control plane: The switches that make up the network
operate as layer- 3 routers with routing tables calculated
by OSPF, thereby enabling the use of multiple paths while
using a time-tested protocol. However, the IP addresses
used by services running in the data center must not be
tied to particular switches in the network, or the ability for
agile reassignment of servers between services would be
lost. Leveraging a trick used in many systems, 9 VL2 assigns
servers IP addresses that act as names alone, with no topological significance. When a server sends a packet, the
shim layer on the server invokes a directory system to learn
the actual location of the destination and then tunnels the
original packet there. The shim layer also helps eliminate
the scalability problems created by ARP in layer- 2 networks,
and the tunneling improves our ability to implement VLB.
These aspects of the design enable VL2 to provide layer- 2
semantics—eliminating the fragmentation and waste of
server pool capacity that the binding between addresses
and locations causes in the existing architecture.
Contributions: In the course of this paper, we describe
the current state of data center networks and the traffic
across them, explaining why these are important to designing a new architecture. We present VL2’s design, which we
have built and deployed into an 80-server cluster. Using
the cluster, we experimentally validate that VL2 has the
properties set out as objectives, such as uniform capacity and performance isolation. We also demonstrate the
speed of the network, such as its ability to shuffle 2.7TB
of data among 75 servers in 395s (averaging 58.8Gbps).
Finally, we describe our experience applying VLB in a new
context, the inter-switch fabric of a data center, and show
that VLB smooths utilization while eliminating persistent
congestion.
2. BacKGRounD
In this section, we first explain the dominant design pattern
for data center architecture today. 5 We then discuss why this
architecture is insufficient to serve large cloud-service data
centers.
As shown in Figure 1, the network is a hierarchy reaching from a layer of servers in racks at the bottom to a layer
of core routers at the top. There are typically 20–40 servers
per rack, each singly connected to a Top of Rack (ToR) switch
with a 1Gbps link. ToRs connect to two aggregation switches
for redundancy, and these switches aggregate further connecting to access routers. At the top of the hierarchy, core
routers carry traffic between access routers and manage
traffic into and out of the data center. All links use Ethernet
as a physical-layer protocol, with a mix of copper and fiber
cabling. All switches below each pair of access routers form
a single layer- 2 domain. The number of servers in a single
figure 1. a conventional network architecture for data centers
(adapted from figure by cisco5).
Internet
Internet
CR CR
Data Center
Layer 3
AR
AR
...
AR
AR
Layer 2
AS
AS
s
s
s
s ...
ToR
Servers
ToR
Servers
ToR
Servers
ToR
Servers... ...
•
•
•
•
•
A Single Layer 2 Domain