datapath, along with instruction and
data caches, in a single chip.
For example, Figure 1 shows the
RISC-I8 and MIPS12 microprocessors
developed at the University of California, Berkeley, and Stanford University
in 1982 and 1983, respectively, that
demonstrated the benefits of RISC.
These chips were eventually presented
at the leading circuit conference, the
IEEE International Solid-State Circuits
Conference, in 1984.33, 35 It was a remarkable moment when a few graduate students at Berkeley and Stanford
could build microprocessors that were
arguably superior to what industry
could build.
These academic chips inspired
many companies to build RISC microprocessors, which were the fastest for
the next 15 years. The explanation is
due to the following formula for processor performance:
Time/Program = Instructions /
Program × (Clock cycles) /
Instruction × Time / (Clock cycle)
DEC engineers later showed2 that
the more complicated CISC ISA executed about 75% of the number instructions per program as RISC (the first
term), but in a similar technology CISC
executed about five to six more clock
cycles per instruction (the second
term), making RISC microprocessors
approximately 4× faster.
Such formulas were not part of computer architecture books in the 1980s,
leading us to write Computer Architecture: A Quantitative Approach13 in 1989.
The subtitle suggested the theme of the
book: Use measurements and benchmarks to evaluate trade-offs quantitatively instead of relying more on the
architect’s intuition and experience, as
in the past. The quantitative approach
we used was also inspired by what Turing laureate Donald Knuth’s book had
done for algorithms.
20
VLIW, EPIC, Itanium. The next ISA
innovation was supposed to succeed
both RISC and CISC. Very long instruc-
tion word (VLIW)
7 and its cousin, the
explicitly parallel instruction computer
(EPIC), the name Intel and Hewlett
Packard gave to the approach, used wide
instructions with multiple independent
operations bundled together in each
instruction. VLIW and EPIC advocates
at the time believed if a single instruc-
tion could specify, say, six independent
face created an opportunity for archi-
tecture innovation.
Turing laureate John Cocke and his
colleagues developed simpler ISAs and
compilers for minicomputers. As an
experiment, they retargeted their research compilers to use only the simple
register-register operations and load-store data transfers of the IBM 360 ISA,
avoiding the more complicated instructions. They found that programs ran up
to three times faster using the simple
subset. Emer and Clark6 found 20% of
the VAX instructions needed 60% of the
microcode and represented only 0.2%
of the execution time. One author (
Patterson) spent a sabbatical at DEC to
help reduce bugs in VAX microcode. If
microprocessor manufacturers were
going to follow the CISC ISA designs
of the larger computers, he thought
they would need a way to repair the
microcode bugs. He wrote such a
paper,
31 but the journal Computer
rejected it. Reviewers opined that it was
a terrible idea to build microprocessors with ISAs so complicated that they
needed to be repaired in the field. That
rejection called into question the value
of CISC ISAs for microprocessors. Ironically, modern CISC microprocessors
do indeed include microcode repair
mechanisms, but the main result of his
paper rejection was to inspire him to
work on less-complex ISAs for microprocessors—reduced instruction set
computers (RISC).
These observations and the shift to
high-level languages led to the opportunity to switch from CISC to RISC. First,
the RISC instructions were simplified
so there was no need for a microcoded interpreter. The RISC instructions
were typically as simple as microinstructions and could be executed directly by the hardware. Second, the
fast memory, formerly used for the
microcode interpreter of a CISC ISA,
was repurposed to be a cache of RISC
instructions. (A cache is a small, fast
memory that buffers recently executed instructions, as such instructions
are likely to be reused soon.) Third,
register allocators based on Gregory
Chaitin’s graph-coloring scheme made
it much easier for compilers to efficiently use registers, which benefited these
register-register ISAs.
3 Finally, Moore’s
Law meant there were enough transistors in the 1980s to include a full 32-bit
In today’s post-PC
era, x86 shipments
have fallen almost
10% per year since
the peak in 2011,
while chips with
RISC processors
have skyrocketed
to 20 billion.