ing a deluge of error notifications and
resulting route recomputation based
on fluctuating and inconsistent link
status. Some link-layer protocols allow
the link speed to be adjusted downward in hopes of improving the link
quality. Of course, lowering the link
speed results in a reduced bandwidth
link, which in turn may limit the overall
bandwidth of the network or at the very
least will create load imbalance as a result of increased contention across the
slow link. Because of these complicating factors, it is often better to logically
excise the faulty link from the routing
algorithm until the physical link can be
replaced and validated.
The data-center network is generally regarded as a critical design element in the system architecture and
the skeletal structure upon which
processor, memory, and I/O devices
are dynamically shared. The evolution from 1G to 10G Ethernet and the
emerging 40G Ethernet has exposed
performance bottlenecks in the communication stack that require better hardware-software coordination
for efficient communication. Other
approaches by Solarflare, Myricom,
and InfiniBand, among others, have
sought to reshape the conventional
sockets programming model with
more efficient abstractions. Internet
sockets, however, remain the dominant programming interface for data-center networks.
Network performance and reliability are key design goals, but they are
tempered by cost and serviceability
constraints. Deploying a large cluster
computer is done incrementally and
is often limited by the power capacity of the building, with power being
distributed across the cluster network
so that a power failure impacts only a
small fraction—say, less than 10%—of
the hosts in the cluster. When hardware fails, as is to be expected, the
operating system running on the host
coordinates with a higher-level hypervisor or cluster operating system to
allow failures to be replaced in situ
without draining traffic in the cluster. Scalable Web applications are designed to expect occasional hardware
failures, and the resulting software is
by necessity resilient.
A good user experience relies on
predictable performance, with the data-center network delivering predictable
latency and bandwidth characteristics
across varying traffic patterns. With
single-thread performance plateauing,
microprocessors are providing more
cores to keep pace with the relentless
march of Moore’s Law. As a result, applications are looking for increasing
thread-level parallelism and scaling
to a large core count with a commensurate increase in communication
among cores. This trend is motivating
communication-centric cluster computing with tens of thousands of cores in
unison, reminiscent of a flock darting
seamlessly amidst the clouds.
1. al-Fares, M., loukissas, a. and Vahdat, a. a scalable,
commodity data-center network architecture. In
Proceedings of the ACM SIGCOMM 2008 Conference
on Data Communication (2008), 63–74; http://doi.acm.
2. amdahl’s law; http://en.wikipedia.org/wiki/amdahl’s_
3. ballani, H., Costa, P., karagiannis, t. and rowstron,
a. towards predictable data-center networks.
In Proceedings of the ACM SIGCOMM 2011
Conference (2011), 242–253; http://doi.acm.
4. barroso, l.a., dean, J. and Holzle, u. Web search
for a planet: the Google cluster architecture. IEEE
Micro 23, 2 (2003), 22–28; http://ieeexplore.ieee.org/
5. Cerf, V. and Icahn r.e. a protocol for packet
network intercommunication. SIGCOMM Computer
Communication Review 35, 2 (2005), 71–82; http://doi.
6. Cisco Data Center Infrastructure 3.0 Design Guide.
data Center design—IP network Infrastructure;
enterprise/data_Center/dC_ 3_0/dC- 3_0_IPInfra.
7. Clos, C. a study of non-blocking switching networks. The
Bell System Technical Journal 32, 2 (1953), 406–424.
8. Fitzpatrick, b. distributed caching with Memcached.
Linux Journal 2004; http://www.linuxjournal.com/
9. dally, W. and towles, b. Principles and Practices
of Interconnection Networks. Morgan kaufmann
Publishers, san Francisco, Ca, 2003.
10. Gill, P., Jain, n. and nagappan, n. understanding
network failures in data centers: measurement,
analysis, and implications. In Proceedings of the ACM
SIGCOMM 2011 Conference (2011), 350–361; http://
11. Greenberg, a., Hamilton, J. r., Jain, n., kandula, s.,
kim, C., lahiri, P., Maltz, d. a., Patel, P. and sengupta,
s. Vl2: a scalable and flexible data center network. In
Proceedings of the ACM SIGCOMM 2009 Conference
on Data Communication (2009): 51–62; http://doi.acm.
12. Greenberg, a., Hamilton, J., Maltz, d.a. and Patel, P.
the cost of a cloud: research problems in data center
networks. SIGCOMM Computer Communications
Review 39, 1 (2008), 68–73; http://doi.acm.
13. Hoelzle, u. and barroso, l. a. The Datacenter
as a Computer: An Introduction to the Design of
Warehouse-Scale Machines (1st ed.). Morgan &
Claypool Publishers, 2009.
14. kermani, P. and kleinrock, l. Virtual cut-through:
a new computer communication switching
technique, Computer Networks 3, 4 (1976), 267–286;
15. leiserson, C.e. Fat-trees: universal networks
for hardware-efficient supercomputing. IEEE
Transactions on Computers 34, 10 (1985), 892–901.
Enterprise Grid Computing
Cooling the Data Center
Improving Performance on the Internet
Dennis Abts is a member of the technical staff at Google,
where he is involved in the architecture and design of next-generation large-scale clusters. Prior to joining Google,
abts was a senior principal engineer and system architect
at Cray Inc. He has numerous technical publications and
patents in areas of interconnection networks, data-center
networking, cache-coherence protocols, high-bandwidth
memory systems, and supercomputing.
Bob Felderman spent time at both Princeton and uCla
before starting a short stint at Information sciences
Institute. He then helped found Myricom, which became a
leader in cluster-computing networking technology. after
seven years there, he moved to Packet design where he
applied high-performance computing ideas to the IP and
ethernet space. He later was a founder of Precision I/o.
all of that experience eventually led him to Google where
he is a principal engineer working on issues in data-center
networking and general platforms system architecture.
© 2012 aCM 0001-0782/12/06 $10.00