contributed articles
I
L
L
U
S
T
R
A
T
I
O
N
B
Y
P
E
T
E
R
C
R
O
W
T
H
E
R
A
S
S
O
C
I
A
T
E
S
THE COMPUTER SYSTEMS we use today make it easy
for programmers to mitigate event latencies in the
nanosecond and millisecond time scales (such as
DRAM accesses at tens or hundreds of nanoseconds
and disk I/Os at a few milliseconds) but significantly
lack support for microsecond (µs)-scale events. This
oversight is quickly becoming a serious problem for
programming warehouse-scale computers, where
efficient handling of microsecond-scale events is
becoming paramount for a new breed of low-latency
I/O devices ranging from datacenter networking to
emerging memories (see the first sidebar “Is the
Microsecond Getting Enough Respect?”).
Processor designers have developed multiple
techniques to facilitate a deep memory hierarchy
that works at the nanosecond scale by providing
a simple synchronous programming interface to
the memory system. A load operation will logically
block a thread’s execution, with the
program appearing to resume after the
load completes. A host of complex mi-
croarchitectural techniques make high
performance possible while supporting
this intuitive programming model. Tech-
niques include prefetching, out-of-order
execution, and branch prediction. Since
nanosecond-scale devices are so fast,
low-level interactions are performed pri-
marily by hardware.
At the other end of the latency-mit-
igating spectrum, computer scientists
have worked on a number of tech-
niques—typically software based—to
deal with the millisecond time scale.
Operating system context switching is
a notable example. For instance, when
a read() system call to a disk is made,
the operating system kicks off the low-
level I/O operation but also performs a
software context switch to a different
thread to make use of the processor
during the disk operation. The original
thread resumes execution sometime af-
ter the I/O completes. The long overhead
of making a disk access (milliseconds)
easily outweighs the cost of two context
switches (microseconds). Millisecond-
scale devices are slow enough that the
cost of these software-based mecha-
nisms can be amortized (see Table 1).
These synchronous models for in-
teracting with nanosecond- and milli-
second-scale devices are easier than the
alternative of asynchronous models. In
an asynchronous programming model,
the program sends a request to a device
and continue processing other work
Attack of
the Killer
Microseconds
DOI: 10.1145/3015146
Microsecond-scale I/O means tension
between performance and productivity
that will need new latency-mitigating ideas,
including in hardware.
BY LUIZ BARROSO, MIKE MARTY, DAVID PATTERSON, AND
PARTHASARATHY RANGANATHAN
key insights
˽ A new breed of low-latency I/O devices,
ranging from faster datacenter networking
to emerging non-volatile memories and
accelerators, motivates greater interest in
microsecond-scale latencies.
˽ Existing system optimizations targeting
nanosecond- and millisecond-scale
events are inadequate for events in the
microsecond range.
˽ New techniques are needed to enable
simple programs to achieve high
performance when microsecond-scale
latencies are involved, including new
microarchitecture support.