of writing and running Gmail, there
was a major increase in its adoption
and user happiness.
Many techniques are available to
application developers for improving
client-side response times, and not
all of them require large engineering investments. Google’s PageSpeed
project was created to share with the
world the company’s insights into
client-side response optimization,
accompanied by tools that help engineers apply these insights to their
own products and Web pages.
5 One of
the obvious rules is to reduce server
response time as much as possible.
PageSpeed analysis tools also recommend various well-known techniques
for client-side optimization, including compression of static content,
using a preprocessor to “minify” code
(HTML, CSS, and JavaScript) by removing unnecessary and redundant
text, setting cache-control headers
correctly, compressing or inlining
images, and more.
To recap, measure the actual user
experience by measuring how long a
user must wait for a response after performing an action on your product. Do
this, even though it is often not easy.
Experience says it will be well worth
the effort.
Lesson 2. Measure Speed at
the 95th and 99th Percentiles
While “Speed matters” is a good axiom
when thinking about user (
un)happi-ness, that still leaves an open question
about how best to quantify the speed
of a service. In other words, even if you
understand and accept the value of the
latency metric (time to respond to user
requests) should be low enough to keep
users happy, do you know precisely
what metric that is? Should you measure average latency, median latency,
or nth-percentile latency?
In the early days of Google’s SRE
organization, when we managed rela-
tively few products other than Search
and Ads, SLOs (service-level objec-
tives) were set for speed based on
median latency. (An SLO is a target
value for a given metric, used to com-
municate the desired level of perfor-
mance for a service. When the target
is achieved, that aspect of the service
is considered to be performing ad-
equately. In the context of SLOs, the
metric being evaluated is called an
SLI, or service-level indicator.)
Over the years, particularly as the
use of Search expanded to other conti-
nents, we learned that users could be
unhappy even when we were meeting
and beating our SLO targets. We then
conducted research to determine the
impact of slight degradations in re-
sponse time on user behavior, and
found that users would conduct signif-
icantly fewer searches when encoun-
tering incremental delays as small
as 200 milliseconds.
3 Based on these
and other findings, we have learned
to measure “long-tail” latency—that
is, latency must be measured at the
95th and 99th percentiles to capture the
user experience accurately. After all, it
doesn’t matter if a product is serving
the correct result 99.999% of the time
if 5% of users are unhappy with how
long it takes to get that correct result.
Once upon a time, Google measured
only raw availability. In fact, most SLOs
even today are framed around avail-
ability: How many requests return a
good result versus how many return an
error. Availability was computed the
following way:
Availability =
1 – error responses
Suppose you have a user service that
normally responds in half a second,
which sounds good enough for a user
on a smartphone, given typical wireless network delays. Now suppose one
request in 30 has an internal problem
causing a delay that leads to the mobile
client app retrying the request after 10
seconds. Now further suppose the retry almost always succeeds. The availability metrics (as computed here) will
say “100% availability.” Users will say
“97% available”—because if they are
accustomed to receiving a response in
500 milliseconds, after three to five seconds they will hit retry or switch apps.
It doesn’t matter if the user documentation says, “The application may take
up to 10 seconds to respond;” once the
user base is trained to get an answer in
500 milliseconds most of the time, that
is what they will expect, and they will be-
There is a technique to phrasing SLO definitions optimally—a linguistic point
illustrated here with an amusing puzzle. Consider these two alternative SLO definitions
for a given Web service, using slightly different language in each definition:
1. The 99th-percentile latency for user requests, averaged over a trailing five-minute
time window, will be less than 800 milliseconds.
2. Some 99% of user requests, averaged over a trailing five-minute time window, will
complete in less than 800 milliseconds.
Assume the SLO will be measured every 10 seconds in either case, and an alert will
be fired if N consecutive measurements are out of range. Before reading further, think
about which SLO definition is better, and why.
The answer is that from a user-happiness perspective, the two SLOs are practically
equivalent; and yet, from a computational perspective, alternative number 2 is
distinctly superior.
To appreciate this, consider a hypothetical Web service receiving 10,000 user
requests per second, on average, under peak load conditions. With SLO definition
1, the measurement algorithm actually has to compute a percentile value every 10
seconds. A naive approach to this computation is as follows:
˲ Store the response times for 10,000 × 300 = 3 million queries in memory to capture
five minutes’ worth of data (this will use >11MB of memory to store 3 million 32-bit integers, each representing the response time for one query in milliseconds).
˲ Sort these 3 million integer values.
˲ Read the 99th-percentile value (that is, the 30,000th latency value in the sorted list,
counting from the maximum downward).
More efficient algorithms are definitely available, such as using 16-bit short integers
for latency values and using two heaps instead of sorting a linear list every 10 seconds,
but even these improved approaches involve significant overhead.
In contrast, SLO definition 2 requires storing only two integers in memory: the
count of user requests with completion times greater than 800 milliseconds, and the
total count of user requests. Determining SLO compliance is then a simple division
operation, and you don’t have to remember latency values at all.
Be sure to define your long-tail latency SLOs using format 2.
How to Define
Percentile-Based SLOs