for throughput would certainly benefit
from it. Moreover, the transistor budget from the unused cache could be
used to integrate even more cores with
the power density of the cache. Aggressive voltage scaling provides an avenue
for utilizing the unused transistor-integration capacity for logic to deliver
higher performance.
Aggressive supply-voltage scaling
comes with its own challenges (such
as variations). As supply voltage is reduced toward a transistor’s threshold
voltage, the effect of variability is even
worse, because the speed of a circuit
is proportional to the voltage overdrive (supply voltage minus threshold
voltage). Moreover, as supply voltage
approaches the threshold, any small
change in threshold voltage affects the
speed of the circuit. Therefore, variation in the threshold voltage manifests itself as variation in the speed
of the core, the slowest circuit in the
core determines the frequency of operation of the core, and a large core is
more susceptible to lower frequency
of operation due to variations. On the
other hand, a large number of small
cores has a better distribution of fast
and slow small cores and can better
even out the effect of variations. We
next discuss an example system that
is variation-tolerant, energy-efficient,
energy-proportional, and fine-grain
power managed.
A hypothetical heterogeneous pro-
cessor (see Figure 14) consists of a
small number of large cores for single-
thread performance and many small
cores for throughput performance.
Supply voltage and the frequency of any
given core are individually controlled
such that the total power consumption
is within the power envelope. Many
small cores operate at lower voltages
and frequency for improved energy ef-
ficiency, while some small cores oper-
ate near threshold voltage at the lowest
frequency but at higher energy effi-
ciency, and some cores may be turned
off completely. Clock frequencies need
not be continuous; steps (in powers of
two) keep the system synchronous and
simple without compromising perfor-
mance while also addressing variation
tolerance. The scheduler dynamically
monitors workload and configures the
system with the proper mix of cores
and schedules the workload on the
right cores for energy-proportional
computing. Combined heterogene-
ity, aggressive supply-voltage scaling,
and fine-grain power (energy) manage-
ment enables utilization of a larger
fraction of transistor-integration ca-
pacity, moving closer to the goal of 30x
increase in compute performance (see
Table 6).
table 7. software challenges, trends, directions.
Challenge
1,000-fold
software
parallelism
near-term
Data parallel languages and “mapping”
of operators, library and tool-based
approaches
Long-term
new high-level languages,
compositional and deterministic
frameworks
advanced interpretive and compiler
technologies, as well as increasing use
of dynamic translation techniques. We
expect these trends to continue, with
higher-level programming, extensive
customization through libraries, and
sophisticated automated performance
search techniques (such as autotun-ing) will be even more important.
Extreme studies27, 38 suggest that
aggressive high-performance and ex-treme-energy-efficient systems may
go further, eschewing the overhead of
programmability features that software engineers have come to take for
granted; for example, these future systems may drop hardware support for
a single flat address space (which normally wastes energy on address manip-ulation/computing), single-memory
hierarchy (coherence and monitoring
energy overhead), and steady rate of
execution (adapting to the available
energy budget). These systems will
place more of these components under software control, depending on increasingly sophisticated software tools
to manage the hardware boundaries
and irregularities with greater energy
efficiency. In extreme cases, high-performance computing and embedded
applications may even manage these
complexities explicitly. Most architectural features and techniques we’ve
discussed here shift more responsibility for distribution of the computation and data across the compute and
storage elements of microprocessors
to software. 13, 18 Shifting responsibility
increases potential achievable energy
efficiency, but realizing it depends on
significant advances in applications,
compilers and runtimes, and operating systems to understand and even
predict the application and workload
behavior. 7, 16, 19 However, these advances require radical research breakthroughs and major changes in software practice (see Table 7).
energy-efficient
data movement
and locality
Manual control, profiling, maturing to
automated techniques (auto-tuning,
optimization)
energy
management
Automatic fine-grain hardware
management
Resilience
Algorithmic, application-software
approaches, adaptive checking and
recovery
new algorithms, languages,
program analysis, runtime,
and hardware techniques
self-aware runtime and
application-level techniques that
exploit architecture features for
visibility and control
new hardware-software partnerships
that minimize checking and
recomputation energy
Conclusion
The past 20 years were truly the great
old days for Moore’s Law scaling and
microprocessor performance; dramatic improvements in transistor
density, speed, and energy, combined
with microarchitecture and memory-hierarchy techniques delivered 1,000-
fold microprocessor performance
improvement. The next 20 years—the