figure 4. an example clos network between aggregation and
intermediate switches provides a richly connected backbone well
suited for VLB. the network is built with two separate address
families—topologically significant locator-specific addresses (Las)
and flat application-specific addresses (aas).
Link-state network
carrying only LAs
(e.g., 10/8)
DIx10G
Int
Internet
. . .
DA/2 x Intermediate Switches
Aggr
. . .
ToR
2 x10G
. . . DA/2 x 10G
20
Servers
. . . .
20(DADI/4) x Servers
Fungible pool of
servers owning AAs
(e.g., 20/8)
of bandwidth. Further, it is easy and inexpensive to build a
Clos network for which there is no oversubscription (further
discussion on cost is given in Section 6). For example, in
Figure 4, we use DA-port Aggregation and DI-port intermediate switches, and connect these switches such that the
capacity between each layer is DI DA/2 times the link capacity.
The Clos topology is exceptionally well suited for VLB in
that by forwarding traffic through an intermediate switch
that is chosen in a destination-independent passion (e.g.,
randomly chosen), the network can provide bandwidth guarantees for any traffic matrices that obey the hose model. 8
Meanwhile, routing remains simple and resilient on this
topology—take a random path up to a random intermediate
switch and a random path down to a destination ToR switch.
4. 2. VL2 addressing and routing
This section explains the motion of packets in a VL2 network, and how the topology, routing design, VL2 agent, and
directory system combine to virtualize the underlying network fabric and create the illusion that hosts are connected
to a big, noninterfering data center–wide layer- 2 switch.
address Resolution and Packet Forwarding: VL2 uses two
separate classes of IP-address illustrated in Figure 4. The
network infrastructure operates using LAs; all switches and
interfaces are assigned LAs, and switches run an IP-based
(layer- 3) link-state routing protocol that disseminates only
these LAs. This allows switches to obtain complete knowledge about the switch-level topology, as well as forward any
packets encapsulated with LAs along the shortest paths.
On the other hand, applications use permanent AAs, which
remain unaltered no matter how servers’ locations change
due to VM migration or reprovisioning. Each AA (server) is
associated with an LA, the IP address of the ToR switch to
which the application server is connected. The ToR switch
need not be physical hardware—it could be a virtual switch
or hypervisor implemented in software on the server itself!
The VL2 directory system stores the mapping of AAs to
LAs, and this mapping is created when application servers are provisioned to a service and assigned AA addresses.
Resolving these mappings through a unicast-based custom
protocol eliminates the ARP and DHCP scaling bottlenecks
that plague large Ethernets.
Packet forwarding: Since AA addresses are not announced
into the routing protocols of the network, for a server to
receive a packet the sending server must first encapsulate
the packet (Figure 5), setting the destination of the outer
header to the LA of the destination AA. Once the packet
arrives at the LA (the destination ToR or hypervisor), the
switch (the ToR or the VM switch in the destination hypervisor) decapsulates the packet and delivers it to the destination AA given in the inner header.
address resolution and access control: Servers in each
service are configured to believe that they all belong to the
same IP subnet, so when an application sends a packet to an
AA for the first time, the networking stack on the host generates a broadcast ARP request for the destination AA. The
VL2 agent running in the source’s networking stack intercepts the ARP request and converts it to a unicast query to
the VL2 directory system. The directory system answers the
query with the LA of the ToR to which packets should be tunneled. During this resolution process, the directory server
can additionally evaluate the access-control policy between
the source and destination and selectively reply to the resolution query, enforcing necessary isolation policies between
applications.
These addressing and forwarding mechanisms were chosen for two main reasons. First, they make it possible to use
low-cost switches, which often have small routing tables
(typically just 16K entries) that can hold only LA routes,
without concern for the huge number of AAs. Second, they
allow the control plane to support agility with very little
overhead; the design obviates frequent link-state advertisements to disseminate host-state changes and host/switch
reconfiguration.
Random traffic Spreading over Multiple Paths: To offer hot
figure 5: VLB in an example VL2 network. sender S sends packets
to destination D via a randomly chosen intermediate switch using
iP-in-iP encapsulation. aas are from 20/8, and Las are from 10/8.
H(ft) denotes a hash of the five tuple.
Link-state network with LAs (10/8)
( 10.0.0.4)
( 10. 1. 1. 1) ( 10. 1. 1. 1)
Payload
Payload
H(ft)
H(ft)
H(ft)
H(ft)
20.0.0.55
10. 1. 1. 1
Int Int
( 10. 1. 1. 1)
Int
10.0.0.6
20.0.0.56
20.0.0.55
10. 1. 1. 1
10.0.0.6
20.0.0.56
ToR
( 20.0.0.1)
H(ft) 10.0.0.6
20.0.0.55 20.0.0.56
( 10.0.0.6)
ToR
( 20.0.0.1)
Payload
20.0.0.55 20.0.0.66
IP subnet with AAs (20/8)
S( 20.0.0.55)
IP subnet with AAs (20/8)
D( 20.0.0.56)