fect of such a network on power consumption. Figure 11(b) shows the energy consumed in moving a bit across
a hop in such a network, measured in
historic networks, and extrapolated
into the future from previous assumptions. If only 10% of the operands move
over the network, traversing 10 hops
on average, then at the rate of 0.06pJ/
bit the network power would be 35
watts, more than half the power budget of the processor.
As the energy cost of computation is
reduced by voltage scaling (described
later), emphasizing compute throughput, the cost of data movement starts
to dominate. Therefore, data movement must be restricted by keeping
data locally as much as possible. This
restriction also means the size of local
storage (such as a register file) must
increase substantially. This increase
is contrary to conventional thinking of
register files being small and thus fast.
With voltage scaling the frequency of
operation is lower anyway, so it makes
sense to increase the size of the local
storage at the expense of speed.
Another radical departure from
conventional thinking is the role of
the interconnect network on the chip.
Recent parallel machine designs have
been dominated by packet-switching, 6, 8, 24, 40 so multicore networks adopted this energy-intensive approach.
In the future, data movement over
these networks must be limited to conserve energy, and, more important,
due to the large size of local storage
data bandwidth, demand on the network will be reduced. In light of these
findings on-die-network architectures
need revolutionary approaches (such
as hybrid packet/circuit switching4).
Many older parallel machines used
irregular and circuit-switched networks31, 41; Figure 12 describes a return to hybrid switched networks for
on-chip interconnects. Small cores in
close proximity could be interconnected into clusters with traditional busses that are energy efficient for data
movement over short distances. The
clusters could be connected through
wide (high-bandwidth) low-swing (
low-energy) busses or through packet- or
circuit-switched networks, depending
on distance. Hence the network-on-a-chip could be hierarchical and heterogeneous, a radical departure from the
figure 13. Improving energy efficiency through voltage scaling.
maximum frequency (mhz)
320mV
65nm CMoS, 50° C
104
103
102
101
102
320mV
1
0.2 0.4
0.6 0.8 1. 2 1.0
supply Voltage (V)
101
1
10–1
total Power ( Watts)
10–2
1. 4
450
400
350
300
250
200
150
100
50
subthreshold Region
energy efficienty (GoP/Watt)
0
0.2 0.4
65nm CMoS, 50° C
101
1
320mV
10–1
10–2
1. 4
active Leakage Power (m W)
0.6 0.8 1. 2 1.0
supply Voltage (V)
table 6. Circuits challenges, trends, directions.
Challenge
Power, energy
efficiency
variation
Gradual,
temporal,
intermittent,
and permanent
faults
near-term Long-term
Continuous dynamic voltage and
frequency scaling, power gating, reactive
power management
Discrete dynamic voltage and frequency
scaling, near threshold operation,
proactive fine-grain power and energy
management
speed binning of parts, corrections with
body bias or supply voltage changes,
tighter process control
Dynamic reconfiguration of many cores
by speed
Guard-bands, yield loss, core sparing,
design for manufacturability
Resilience with hardware/software
co-design, dynamic in-field detection,
diagnosis, reconfiguration and repair,
adaptability, and self-awareness
traditional parallel-machine approach
(see Table 5).
The role of microprocessor architect must expand beyond the processor core, into the whole platform on
a chip, optimizing the cores as well as
the network and other subsystems.
Pushing the envelope: Extreme
circuits, variability, resilience. Our
analysis showed that in the power-constrained scenario, only 150 million logic transistors for processor
cores and 80MB of cache will be affordable due to energy by 2018. Note
that 80MB of cache is not necessary
for this system, and a large portion of
the cache-transistor budget can be utilized to integrate even more cores if it
can be done with the power-consump-tion density of a cache, which is 10x
less than logic. This approach can be
achieved through aggressive scaling of
supply voltage. 25
Figure 13 outlines the effective-
ness of supply-voltage scaling when
the chip is designed for it. As the
supply voltage is reduced, frequency
also reduces, but energy efficiency in-
creases. When the supply voltage is
reduced all the way to the transistor’s
threshold, energy efficiency increases
by an order of magnitude. Employing
this technique on large cores would
dramatically reduce single-thread
performance and is hence not recom-
mended. However, smaller cores used
figure 14. a heterogeneous many-core
system with variation.
large-Core large-Core
Single-thread
performance
f/2 f f/4 f/2
Throughput
performance
f/2 f f/4 f/4
f/2 f f f/4
energy
efficient with
fine-grain
power
management