figure 7. VL2 testbed comprising 80 servers and 10 switches.
ports on each intermediate switch. The ToR switches have
2 10Gbps ports and 24 1Gbps ports. Each ToR is connected
to 2 aggregation switches via 10Gbps links, and to 20 servers via 1Gbps links. Internally, the switches use commodity ASICs: Broadcom ASICs 56820 and 56514, although any
switch that supports line rate L3 forwarding, OSPF, ECMP,
and IPinIP decapsulation will work.
Overall, our evaluation shows that VL2 provides an
effective substrate for a scalable data center network; VL2
achieves ( 1) 94% optimal network capacity, ( 2) a TCP fairness
index of 0.995, ( 3) graceful degradation under failures with
fast reconvergence, and ( 4) 50K lookups/s under 10ms for
fast address resolution.
5. 1. VL2 provides uniform high capacity
A central objective of VL2 is uniform high capacity between
any two servers in the data center. How closely does the performance and efficiency of a VL2 network match that of a
layer- 2 switch with 1: 1 oversubscription? To answer this
question, we consider an all-to-all data shuffle stress test:
all servers simultaneously initiate TCP transfers to all other
servers. This data shuffle pattern arises in large-scale sorts,
merges, and joint operations in the data center, for example,
in Map/Reduce or DryadLINQ jobs. 7, 22 Application developers use these operations with caution today, because they
are so network resource expensive. If data shuffles can be
supported efficiently, it would have large impact on the overall algorithmic and data storage strategy.
We create an all-to-all data shuffle traffic matrix involving
75 servers. Each of 75 servers must deliver 500MB of data to
each of the 74 other servers—a shuffle of 2.7TB from memory to memory. Figure 8 shows how the sum of the goodput
over all flows varies with time during a typical run of the
2.7TB data shuffle. During the run, the sustained utilization
of the core links in the Clos network is about 86%, and VL2
achieves an aggregate goodput of 58.8Gbps. The goodput is
very evenly divided among the flows for most of the run, with
a fairness index between the flows of 0.99515 where 1.0 indicates perfect fairness (mean goodput per flow 11.4Mbps,
figure 8. aggregate goodput during a 2.7tB shuffle among 75
Aggregate goodput (Gbps)
150 200 250
standard deviation 0.75Mbps). This goodput is more than
10x what the network in our current data centers can achieve
with the same investment.
We measure how close VL2 gets to the maximum achievable throughput in this environment by computing the
goodput efficiency for this data transfer. Goodput efficiency is defined as the ratio of the sent goodput summed
over all interfaces divided by the sum of the interface
capacities. An efficiency of 1.0 would mean that all the
capacity on all the interfaces is entirely used carrying useful bytes from the time the first flow starts to when the last
flow ends. The VL2 network achieves an efficiency of 94%,
with the difference from perfect being due to the encapsulation headers ( 3.8%), TCP congestion control dynamics,
and TCP retransmissions.
This 94% efficiency combined with the fairness index
of 0.995 demonstrates that VL2 can achieve uniform high
bandwidth across all servers in the data center.
5. 2. VL2 provides performance isolation
One of the primary objectives of VL2 is agility, which we
define as the ability to assign any server, anywhere in the
data center to any service (Section 1). Achieving agility critically depends on providing sufficient performance isolation
between services so that if one service comes under attack or
a bug causes it to spray packets, it does not adversely impact
the performance of other services.
Performance isolation in VL2 rests on the mathematics
of VLB—that any traffic matrix that obeys the hose model
is routed by splitting to intermediate nodes in equal ratios
(through randomization) to prevent any persistent hot
spots. Rather than have VL2 perform admission control or
rate shaping to ensure the traffic offered to the network conforms to the hose model, we instead rely on TCP to ensure
that each flow offered to the network is rate limited to its fair
share of its bottleneck.
A key question we need to validate for performance isolation is whether TCP reacts sufficiently quickly to control
the offered rate of flows within services. TCP works with
packets and adjusts their sending rate at the time scale of
RTTs. Conformance to the hose model, however, requires
instantaneous feedback to avoid oversubscription of traffic
ingress/egress bounds. Our next set of experiments shows
that TCP is “fast enough” to enforce the hose model for traffic in each service so as to provide the desired performance
isolation across services.