In this experiment, we add two services to the network.
The first service has a steady network workload, while the
workload of the second service ramps up and down. Both
the services’ servers are intermingled among the 4 ToRs,
so their traffic mixes at every level of the network. Figure
9 shows the aggregate goodput of both services as a function of time. As seen in the figure, there is no perceptible
change to the aggregate goodput of service one as the flows
in service two start up or complete, demonstrating performance isolation when the traffic consists of large long-lived flows. In Figure 10, we perform a similar experiment,
but service two sends bursts of small TCP connections,
each burst containing progressively more connections.
These two experiments demonstrate TCP’s enforcement of
the hose model sufficient to provide performance isolation
across services at timescales greater than a few RTT (i.e.,
1–10ms in data centers).
5. 3. VL2 directory system performance
Finally, we evaluate the performance of the VL2 directory
system which provides the equivalent semantics of ARP in
layer 2. We perform this evaluation through macro- and
micro-benchmark experiments on the directory system. We
run our prototype on up to 50 machines: 3–5 RSM nodes,
3–7 directory server nodes, and the remaining nodes emulating multiple instances of VL2 agents generating lookups
Our evaluation supports four main conclusions. First,
the directory system provides high throughput and fast
response time for lookups: three directory servers can
figure 9: aggregate goodput of two services with servers
intermingled on the toRs. service one’s goodput is unaffected
as service two ramps traffic up and down.
Aggregate goodput (Gbps)
120 140 160
figure 10. aggregate goodput of service one as service two creates
bursts containing successively more short tcP connections.
Aggregate goodput (Gbps)
# mice started
80 90 100
handle 50K lookup/second with latency under 10ms (99th
percentile latency). Second, the directory system can handle
updates at rates significantly higher than the expected
churn rate in typical environments: three directory servers
can handle 12K updates/s within 600ms (99th percentile
latency). Third, our system is incrementally scalable: each
directory server increases the processing rate by about 17K
for lookups and 4K for updates. Finally, the directory system is robust to component (directory or RSM servers) failures and offers high availability under network churn.
To understand the incremental scalability of the directory system, we measured the maximum lookup rates
(ensuring sub-10ms latency for 99% requests) with 3, 5, and
7 directory servers. The result confirmed that the maximum
lookup rates increases linearly with the number of directory servers (with each server offering a capacity of 17K
lookups/s). Based on this result, we estimate the worst case
number of directory servers needed for a 100K server data
center. Using the concurrent flow measurements (Figure 3),
we use the median of 10 correspondents per server in a 100s
window. In the worst case, all 100K servers may perform 10
simultaneous lookups at the same time resulting in a million simultaneous lookups per second. As noted above,
each directory server can handle about 17K lookups/s under
10ms at the 99th percentile. Therefore, handling this worst
case will require a directory system of about 60 servers
(0.06% of the entire servers).
In this section, we address several remaining concerns
about the VL2 architecture, including whether other traffic
engineering mechanisms might be better suited to the DC
than VLB, and the cost of a VL2 network.
Optimality of vlB: As noted in Section 4. 2. 2, VLB uses
randomization to cope with volatility, potentially sacrificing some performance for a best-case traffic pattern by
turning all traffic patterns (including both best-case and
worst-case) into the average case. This performance loss will
manifest itself as the utilization of some links being higher
than they would under a more optimal traffic engineering
system. To quantify the increase in link utilization VLB will
suffer, we compare VLB’s maximum link utilization with
that achieved by other routing strategies on the VL2 topology for a full day’s traffic matrices (TMs) (at 5 min intervals)
from the data center traffic data reported in Section 3.
We first compare to adaptive routing, which routes each
TM separately so as to minimize the maximum link utilization for that TM—essentially upper-bounding the best performance that real-time adaptive traffic engineering could
achieve. Second, we compare to best oblivious routing over
all TMs so as to minimize the maximum link utilization.
(Note that VLB is just one among many oblivious routing
strategies.) For adaptive and best oblivious routing, the
routings are computed using respective linear programs in
cplex. The overall utilization for a link in all schemes is
computed as the maximum utilization over all routed TMs.
In Figure 11, we plot the CDF for link utilizations for
the three schemes. We normalized the link utilization
numbers so that the maximum utilization on any link for