Part i: the end of innocence
In 2000, the roadmap ahead for desktop processing seemed clear to many.
Processors with ever deeper pipelines
and faster clock frequencies would scale
performance into the future. 1,17
Researchers, myself among them, focused on the
consequences of this, such as the wire
delay problem. It was hypothesized that
clock frequencies would grow so fast and
wires so slow, that it would take tens of
cycles to send information across a large
chip. The microarchitectures we build
and ship today really are not equipped to
work under such delay constraints.
However, faster clocks and deeper
pipelines ran into more fundamental
problems. More deeply pipelined pro-
cessors are extremely complex to design
and validate. As designers struggled to
manage this complexity they also were
obtaining diminishing performance returns from the approach. More pipeline
stages increased the length of critical
loops3 in the processor, lengthening the
number of cycles on the critical path
of execution. Finally, while those pipeline stages enabled processors to be
clocked faster, a linear increase in clock
frequency creates a cubic increase in
power consumption. 6 The power that a
commodity desktop processor can consume and still be economically viable is
capped by packaging and cooling costs,
which assert a downward pressure on
clock frequency.
figure 1: cPu clock speed.
sparc
AMD
Intel
PowerPC
4000
3500
3000
2500
mhz
2000
1500
1000
500
0
nov ’ 88
May ’ 94
oct ’ 99
Apr ’05
figure 2: SPec integer cPu performance over a 25-year time span.
?
cPu Performance
spec 2000
spec 95
spec 92
1987
1992
1997
2002
2004
Year
Collectively, these effects manifested themselves as a distinct change in
the growth of processor frequency in
2004 (as indicated in Figure 1). Intel,
in fact, stepped back from aggressive
clock scaling in the Pentium 4 and with
later products such the Core 2. AMD
never attempted to build processors of
the same frequency as Intel, but consequently suffered in the marketing
game, whereby consumers erroneously
assume frequency is the only indicator
of CPU performance.
Clock frequency is clearly not the
same thing as performance. CPU performance must be measured by observing the execution time of real applications. Reasonable people can argue
about the validity of the SPEC benchmark suite. Most would admit it underrepresents memory and I/O effects.
Nevertheless, when we consider much
larger trends in performance over several years it is a reliable indicator of the
progress computer architects and silicon technology have made.
Figure 2 depicts CPU performance
from 1982 to 2007, as measured by several different generations of SPEC integer benchmarks. The world changed by
June 2004. Examining this 25-year time
span, and now with four years of hindsight, it’s clear we have a problem. We
no longer are able to exponentially improve the rate of performance of single
threaded applications.
The fact that we have been able to
improve performance rates in the past
has been a tremendous boon for the IT
industry. Imagine if other industries,
such as the auto or airline business,
had at their core a driving source of
exponential improvement. How would
the auto or airline industries change
if miles per gallon or transport speed
doubled every two years? Exponential performance improvement drives
down cost and increases the user experience by enabling ever richer applications. In fact, manufacturing, materials, architects, and compiler writers
have been so effective at translating
Moore’s Law exponential increase in
chip resources23 into exponential performance improvements, that many
people erroneously use the terms interchangeably. The question before
us as a research field and an industry
is, now that we no longer know how to
translate Moore’s Law growth in avail-