pretty good new days, as progress
continues—will be more difficult,
with Moore’s Law scaling producing
continuing improvement in transistor density but comparatively little
improvement in transistor speed and
energy. As a result, the frequency of
operation will increase slowly. Energy
will be the key limiter of performance,
forcing processor designs to use large-scale parallelism with heterogeneous
cores, or a few large cores and a large
number of small cores operating at
low frequency and low voltage, near
threshold. Aggressive use of customized accelerators will yield the highest
performance and greatest energy efficiency on many applications. Efficient
data orchestration will increasingly
be critical, evolving to more efficient
memory hierarchies and new types of
interconnect tailored for locality and
that depend on sophisticated software
to place computation and data so as to
minimize data movement. The objective is ultimately the purest form of
energy-proportional computing at the
lowest-possible levels of energy. Heterogeneity in compute and communication hardware will be essential to
optimize for performance for energy-proportional computing and coping
with variability. Finally, programming
systems will have to comprehend
these restrictions and provide tools
and environments to harvest the performance.
While no one can reliably predict
the end of Si CMOS scaling, for this
future scaling regime, many electrical
engineers have begun exploring new
types of switches and materials (such
as compound semiconductors, carbon
nanotubes, and graphene) with different performance and scaling characteristics from Si CMOS, posing new
types of design and manufacturing
challenges. However, all such technologies are in their infancy, probably not
ready in the next decade to replace silicon but will pose the same challenges
with continued scaling. Quantum
electronics (such as quantum dots)
are even farther out and when realized
will reflect major challenges of its own,
with yet newer models of computation,
architecture, manufacturing, variability, and resilience.
Because the future winners are far
from clear today, it is way too early to
predict whether some form of scaling
(perhaps energy) will continue or there
will be no scaling at all. The pretty
good old days of scaling that processor
design faces today are helping prepare
us for these new challenges. More-
over, the challenges processor design
will faces in the next decade will be
dwarfed by the challenges posed by
these alternative technologies, render-
ing today’s challenges a warm-up exer-
cise for what lies ahead.
acknowledgments
This work was inspired by the Exascale study working groups chartered in
2007 and 2008 by Bill Harrod of DAR-PA. We thank him and the members
and presenters to the working groups
for valuable insightful discussions
over the past few years. We also thank
our colleagues at Intel who have improved our understanding of these issues through many thoughtful discussions. Thanks, too, to the anonymous
reviewers whose extensive feedback
greatly improved the article.
References
1. advanced Vector extensions. Intel; http://en.wikipedia.
org/wiki/advanced_Vector_extensions
2. altiVec, apple, IBM, Freescale; http://en.wikipedia.org/
wiki/altiVec
3. amdahl, G. Validity of the single-processor approach
to achieving large-scale computing capability. aFIPS
Joint Computer Conference (apr. 1967), 483–485.
4. anders, M. et al. a 4.1Tb/s bisection-bandwidth
560Gb/s/W streaming circuit-switched 8x8 mesh
network-on-chip in 45nm CMoS. International Solid
State Circuits Conference (Feb. 2010).
5. Barroso, l.a. and Hölzle, u. The case for energy-proportional computing. IEEE Computer 40, 12 (Dec.
2007).
6. Bell, S. et. al. TIle64 processor: a 64-core SoC with
mesh interconnect. Ieee International Solid-State
Circuits Conference (2008).
7. Bienia, C. et. al. The ParSeC benchmark suite:
Characterization and architectural implications.
The 17th International Symposium on Parallel
architectures and Compilation Techniques (2008).
8. Blumrich, M. et. al. Design and Analysis of the Blue
Gene/L Torus Interconnection Network. IBM research
report, 2003.
9. Borkar, S. Designing reliable systems from unreliable
components: The challenges of transistor variability
and degradation. IEEE Micro 25, 6 (nov.–Dec. 2005).
10. Borkar, S. Design challenges of technology scaling.
IEEE Micro 19, 4 (July–aug. 1999).
11. Borkar, S. et al. Parameter variations and impact
on circuits and microarchitecture. The 40th annual
Design automation Conference (2003).
12. Catanzaro, B. et. al. ubiquitous parallel computing
from Berkeley, Illinois, and Stanford. IEEE Micro 30, 2
(2010).
13. Cray, Inc. Chapel Language Specification. Seattle, Wa,
2010; http://chapel.cray.com/spec/spec-0.795.pdf
14. Chien, a. 10x10: a general-purpose architectural
approach to heterogeneity and energy efficiency. The
Third Workshop on Emerging Parallel Architctures
at the International Conference on Computational
Science (June 2011).
15. Chien, a. Pervasive parallel computing: an historic
opportunity for innovation in programming and
architecture. aCM Principles and Practice of Parallel
Programming (2007).
16. Cooper, B. et al. Benchmarking cloud serving systems
with yCSB. aCM Symposium on Cloud Computing
(June 2010).
Shekhar Borkar ( Shekhar.y.Borkar@intel.com) is an
Intel Fellow and director of exascale technology at Intel
Corporation, Hillsboro, or.
Andrew A. Chien (andrew. Chien@alum.mit.edu) is
former vice president of research at Intel Corporation and
currently adjunct professor in the Computer Science and
engineering Department at the university of California,
San Diego.