processor and memory. 33, 37 In these
hierarchies, the lowest-level caches
were small but fast enough to match
the processor’s needs in terms of high
bandwidth and low latency; higher levels of the cache hierarchy were then
optimized for size and speed.
Figure 5 outlines the evolution of
on-die caches over the past two de-
cades, plotting cache capacity (a) and
percentage of die area (b) for Intel
microprocessors. At first, cache sizes
increased slowly, with decreasing die
area devoted to cache, and most of the
available transistor budget was devot-
ed to core microarchitecture advances.
During this period, processors were
probably cache-starved. As energy be-
came a concern, increasing cache size
for performance has proven more en-
ergy efficient than additional core-mi-
croarchitecture techniques requiring
energy-intensive logic. For this reason,
more and more transistor budget and
die area are allocated in caches.
figure 5. evolution of on-die caches.
10,000
on-die cache (KB)
1,000
100
10
1
1u 0.5u 0.25u 0.13u 65nm
(a)
on-die cache %
of total die area
60%
50%
40%
30%
20%
10%
0%
1u 0.5u 0.25u 0.13u 65nm
(b)
architecture-improvement cycle has
been sustained for more than two
decades, delivering 1,000-fold performance improvement. How long will it
continue? To better understand and
predict future performance, we decouple performance gain due to transistor
speed and microarchitecture by comparing the same microarchitecture
on different process technologies and
new microarchitectures with the previous ones, then compound the performance gain.
Figure 6 divides the cumulative
1,000-fold Intel microprocessor performance increase over the past two
decades into performance delivered by
transistor speed (frequency) and due to
microarchitecture. Almost two-orders-of-magnitude of this performance increase is due to transistor speed alone,
now leveling off due to the numerous
challenges described in the following
sections.
figure 6. Performance increase separated into transistor speed and microarchitecture
performance.
the next 20 Years
Microprocessor technology has delivered three-orders-of-magnitude performance improvement over the past
two decades, so continuing this trajectory would require at least 30x performance increase by 2020. Micropro-
10,000
10,000
Integer Performance
Transistor Performance
Floating-Point Performance
Transistor Performance
1,000
Relative
100
1,000
Relative
100
table 1. new technology scaling
challenges.
10
10
1
1.5u
0.5u 0.18u
(a)
65nm
1
1.5u
0.5u 0.18u
(b)
65nm
Decreased transistor scaling benefits:
Despite continuing miniaturization, little
performance improvement and little
reduction in switching energy (decreasing
performance benefits of scaling) [ITRs].
flat total energy budget: package
power and mobile/embedded computing
drives energy-efficiency requirements.
figure 7. unconstrained evolution of a microprocessor results in excessive power
consumption.
table 2. ongoing technology scaling.
500
unconstrained evolution 100mm2 Die
400
Power (Watts)
300
200
100
0
Increasing transistor density (in area
and volume) and count: through
continued feature scaling, process
innovations, and packaging innovations.
need for increasing locality and
reduced bandwidth per operation:
as performance of the microprocessor
increases, and the data sets for
applications continue to grow.
2002
2006
2010
2014