technical Perspective
For Better or Worse,
Benchmarks shape a Field
By David Patterson
LiKe oTher iT fields, computer architects initially reported incomparable
results. We quickly saw the folly of
this approach. We then went through
a sequence of performance metrics,
with each being an improvement on
its predecessor: average instruction
time, millions of instructions per second (MIPS), millions of floating point
operations per second (MEGAFLOPS),
synthetic program performance
(DHRYSTONES), and ultimately average performance improvement relative to a reference computer based on a
suite of real programs (SPEC CPU).
When a field has good benchmarks,
we settle debates and the field makes
rapid progress. Indeed, the accelera-
tion in computer performance from
25% to 50% per year starting in the mid-
1980s is due in part to our ability to fair-
ly compare competing designs as well
as to Moore’s Law. Similarly, computer
vision made dramatic advances in the
last decade after it embraced bench-
marks to evaluate innovations in vision
algorithms.a
Sadly, when a field has bad bench-
marks, progress can be problematic.
For example, despite being discredited
in textbooks since 1990,b embedded
computing still reports DHRYSTONES
when making performance claims.
How do we know whether a new em-
bedded processor is a genuine break-
through, or simply the result of cynical
benchmarketering, in that it runs the
benchmark quickly but real programs
slowly? The answer is we cannot know
from DHRYSTONE reports.
In the following paper, the authors
point out that while computer architecture has a glorious past, it has become
a D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A
Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological
Statistics.” Proc. 8th Int’l Conf. Computer Vision,
(July 2001), 416–423.
b J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach, 1st Edition,
Morgan Kauffman, 1990.
a victim of its own success. The SPEC
organization has been selecting old
programs written in old languages that
reflect the state of programming in the
1980s. Given the 1,000,000X improvement in cost-performance since C++ was
unveiled in 1979, most programmers
have moved on to more productive languages. Indeed, a recent survey supports
that claim: only 25% of programs are being written in languages like C and C++.c
Hence, the authors supplement SPEC’s
C and C++ programs, which manage
storage manually, with Java programs
that manage storage automatically. They
called the former programs native languages and the latter managed.
sequential
(non-scalable)
Parallel
(scalable)
C and C++
(native)
Java
(managed)
The paper reflects a second important trend. The power limit of what a
chip could dissipate forced microprocessor manufacturer to switch from
a single high-clock rate processor per
chip to multiple processors or cores per
chips. Thus, the authors include both
sequential and parallel benchmarks;
they call the former non-scalable and the
latter scalable. Moreover, the authors
report on power and energy in addition
to performance. In this PostPC Era, battery life can trump performance in the
client, and the architects of warehouse-scale computers try to optimize the
costs of powering and cooling 100,000
servers as well as improving cost-performance. Just as we learned that measuring time in seconds is a safer measure
of program performance than a rate
like MIPS, we are learning that Joules is
a better measure than a rate like Watts,
which is just Joules/second. The authors report both Watts and Joules in
addition to relative performance.
c R.S. King, “The Top 10 Programming Languages.”
(Oct. 2011); http://spectrum.ieee.org/at-work/tech-
careers/the-top-10-programming-languages/
Given this measurement frame-
work, the authors then measured eight
very different Intel microprocessors
built over a seven-year period. The au-
thors evaluate these eight micropro-
cessors using 61 programs, which each
fit into one of the four quadrants in the
matrix here.
David Patterson is the Pardee Professor of Computer
science at the university of California at berkeley.