cluding approximately 15 branches,
as they represent approximately 25%
of executed instructions. To keep the
pipeline full, branches are predicted
and code is speculatively placed into
the pipeline for execution. The use
of speculation is both the source of
ILP performance and of inefficiency.
When branch prediction is perfect,
speculation improves performance
yet involves little added energy cost—
it can even save energy—but when it
“mispredicts” branches, the processor must throw away the incorrectly
speculated instructions, and their
computational work and energy are
wasted. The internal state of the processor must also be restored to the
state that existed before the mispre-dicted branch, expending additional
time and energy.
To see how challenging such a design
is, consider the difficulty of correctly
ISA principles for general-purpose
processors today is still RISC, 35 years
after their introduction.
Current Challenges for
“If a problem has no solution, it may
not be a problem, but a fact—not to be
solved, but to be coped with over time.”
While the previous section focused
on the design of the instruction set
architecture (ISA), most computer
architects do not design new ISAs
but implement existing ISAs in the
prevailing implementation technology. Since the late 1970s, the technology of choice has been metal oxide
semiconductor (MOS)-based integrated circuits, first n-type metal–
oxide semiconductor (nMOS) and then
complementary metal–oxide semiconductor (CMOS). The stunning rate
of improvement in MOS technology—
captured in Gordon Moore’s predictions—has been the driving factor
enabling architects to design more-aggressive methods for achieving
performance for a given ISA. Moore’s
original prediction in 196526 called
for a doubling in transistor density
yearly; in 1975, he revised it, projecting a doubling every two years.
eventually became called Moore’s
Law. Because transistor density grows
quadratically while speed grows linearly, architects used more transistors to improve performance.
End of Moore’s Law and
Although Moore’s Law held for many
decades (see Figure 2), it began to slow
sometime around 2000 and by 2018
showed a roughly 15-fold gap between
Moore’s prediction and current capability, an observation Moore made in
2003 that was inevitable.
27 The current
expectation is that the gap will continue to grow as CMOS technology approaches fundamental limits.
Accompanying Moore’s Law was a
projection made by Robert Dennard
called “Dennard scaling,”
5 stating that
as transistor density increased, power
consumption per transistor would
drop, so the power per mm2 of sili-
con would be near constant. Since the
computational capability of a mm2 of
silicon was increasing with each new
generation of technology, computers
would become more energy efficient.
Dennard scaling began to slow sig-
nificantly in 2007 and faded to almost
nothing by 2012 (see Figure 3).
Between 1986 and about 2002, the
exploitation of instruction level paral-
lelism (ILP) was the primary architec-
tural method for gaining performance
and, along with improvements in speed
of transistors, led to an annual perfor-
mance increase of approximately 50%.
The end of Dennard scaling meant ar-
chitects had to find more efficient ways
to exploit parallelism.
To understand why increasing ILP
caused greater inefficiency, consider
a modern processor core like those
from ARM, Intel, and AMD. Assume it
has a 15-stage pipeline and can issue
four instructions every clock cycle. It
thus has up to 60 instructions in the
pipeline at any moment in time, in-
Figure 4. Wasted instructions as a percentage of all instructions completed on an Intel
Core i7 for a variety of SPEC integer benchmarks.
Figure 5. Effect of Amdahl’s Law on speedup as a fraction of clock cycle time in serial
49 53 57 61 65