figure 4. Scaling of transistor feature sizes over time. up to the 130nm node, feature
size scaled every two to three years. Since the 90nm generation, feature size scaling
has accelerated to every two years.
Intel
IBM
AMD
Other
32 nm
45 nm
65 nm
90 nm
0.13 um
0.18 um
0.25 um
0.35 um
0.50 um
0.68 um
1.0 um
1. 5 um
inverters (a fanout of four, or FO4) as
the gate-speed metric. Inverters are
the most common gate type, and their
delay is often published in technology
papers. For wire delay it is important
to remember that a design’s area will
shrink with scaling, so its wire delay
will, in general, reduce slowly or, at
worst, stay constant. Its effect on cycle
time depends on the internal circuit
design. Designers generally pipeline
long wires, so they tend not to limit the
critical path. Thus, we ignore wire delay and make the slightly optimistic assumption that a processor’s frequency
in the new technology will be greater
by the ratio of FO4s from old to new:
1985
1990
1995
2000
2005
2010
2015
figure 5. in modern chips, the number of features per transistor has started to grow.
Features Per Transistor
1000
316
100
32
1985
10
1990
1995
2000
2005
2010
clock frequency in that technology using gate-delay data. While the speed
of the cache memory on the processor
scales with technology, the delay going to main memory has scaled only
slowly with time. As a result, doubling
the clock frequency generally does not
double the processor’s performance.
We finesse this issue the same way
the microprocessor industry does: by
scaling the on-chip cache so the percentage memory stall time remains
constant. Using the empirical rule
that miss rates are proportional to
the square root of the cache size, 9, 14
we expand the last-level cache by four
times for each doubling of clock frequency. Thus, we assume that the processor performance scales with clock
frequency, but we penalize the energy
and area of the processor by growing
its cache.
For the clock-cycle time estimate,
we need to know how the delays of
the gates and wires will scale. Fortunately, the delay scaling of different
logic gates is similar, so it is sufficient
to measure how the delay of a single
gate scales. Our analysis uses the delay
of an inverter driving four equivalent
f2 = f1FO41 FO42
Using FO4 as a basic metric has an
additional advantage: it cleanly covers the performance/energy variation
that comes from changing the supply
voltage. Two processors, even built in
the same technology, might be operated at different supply voltages. The
energy difference between the two can
be calculated directly from the supply
voltage, but the voltage’s effect on performance is harder to estimate. Using
FO4 data for these designs at two different voltages provides all the information that is needed.
Having accounted for the effect of
the scaled memory systems, we find
that estimating the power of a processor with scaled technology is fairly
straightforward. Processor power has
two components: dynamic and leakage. In an optimized design, the leakage power is around 30% of the dynamic power, and the leakage power will
scale as the dynamic power scales. 16
Dynamic power is given by the product of the processor’s average activity
factor, α (the probability that a node
will switch each cycle), the processor
frequency, and the energy to switch the
transistors:
Energy = C{Vdd} 2
The processor’s average activity factor depends on the logic and not the
technology, so it is constant with scaling. Since capacitance per unit length
is roughly constant with scaling, C
should be proportional to the feature
size λ. We have already estimated how