Software techniques that tolerate latency
variability are vital to building responsive
large-scale Web services.
By JeffRey Dean anD Luiz anDRé BaRRoSo
as overall use increases. Temporary
high-latency episodes (unimportant in
moderate-size systems) may come to
dominate overall service performance at
large scale. Just as fault-tolerant computing aims to create a reliable whole out of
less-reliable parts, large online services
need to create a predictably responsive
whole out of less-predictable parts;
we refer to such systems as “latency
tail-tolerant,” or simply “tail-tolerant.”
Here, we outline some common causes
for high-latency episodes in large online
services and describe techniques that
reduce their severity or mitigate their
effect on whole-system performance.
In many cases, tail-tolerant techniques
can take advantage of resources already
deployed to achieve fault-tolerance, resulting in low additional overhead. We
explore how these techniques allow system utilization to be driven higher without lengthening the latency tail, thus
avoiding wasteful overprovisioning.
s YsteMs that resPoND to user actions quickly (within
100ms) feel more fluid and natural to users than
those that take longer. 3 Improvements in Internet
connectivity and the rise of warehouse-scale computing
systems2 have enabled Web services that provide fluid
responsiveness while consulting multi-terabyte datasets
spanning thousands of servers; for example, the Google
search system updates query results interactively as
the user types, predicting the most likely query based
on the prefix typed so far, performing the search and
showing the results within a few tens of milliseconds.
Emerging augmented-reality devices (such as the
Google Glass prototype7) will need associated Web
services with even greater responsiveness in order to
guarantee seamless interactivity.
It is challenging for service providers to keep the tail
of latency distribution short for interactive services
as the size and complexity of the system scales up or
Why variability exists?
Variability of response time that leads
to high tail latency in individual components of a service can arise for many
Shared resources. Machines might
be shared by different applications
contending for shared resources (such
as CPU cores, processor caches, memory bandwidth, and network bandwidth), and within the same application different requests might contend
Daemons. Background daemons
may use only limited resources on average but when scheduled can generate
even rare performance hiccups affect
a significant fraction of all requests in
large-scale distributed systems.
eliminating all sources of latency
variability in large-scale systems
is impractical, especially in shared
using an approach analogous to
fault-tolerant computing, tail-tolerant
software techniques form a predictable
whole out of less-predictable parts.