table 3. extrapolated transistor
integration capacity in a fixed power
envelope.
Year
Logic
transistors
(millions)
2008 50
2014 100
2018 150
Cache mB
6
25
80
tency. Clearly, both techniques—
multiple cores and customization—can
improve energy efficiency, the new
fundamental limiter to capability.
Choices in multiple cores. Multiple
cores increase computational throughput by exploiting Moore’s Law to replicate cores. If the software has no
parallelism, there is no performance
benefit. However, if there is parallelism, the computation can be spread
across multiple cores, increasing overall computational performance (and
reducing latency). Extensive research
on how to organize such systems dates
to the 1970s. 29, 39
Industry has widely adopted a multicore approach, sparking many questions about number of cores and size/
power of each core and how they co-ordinate. 6, 36 But if we employ 25-mil-
lion-transistor cores (circa 2008), the
150-million-logic-transistor budget
expected in 2018 gives 6x potential
throughput improvement (2x from
frequency and 3x from increased logic transistors), well short of our 30x
goal. To go further, chip architects
must consider more radical options
of smaller cores in greater numbers,
along with innovative ways to coordinate them.
Looking to achieve this vision,
consider three potential approaches
to deploying the feasible 150 million
logic transistors, as in Table 1. In Figure 9, option (a) is six large cores (good
single-thread performance, total potential throughput of six); option (b) is
30 smaller cores (lower single-thread
performance, total potential throughput of 13); and option (c) is a hybrid
approach (good single-thread performance for low parallelism, total potential throughput of 11).
Many more variations are possible
on this spectrum of core size and num-
ber of cores, and the related choices
in a multicore processor with uniform
instruction set but heterogeneous im-
plementation are an important part
of increasing performance within the
transistor budget and energy envelope.
table 4. Logic organization challenges, trends, directions.
Challenge near-term Long-term
Integration and
memory model
I/O-based interaction, shared memory
spaces, explicit coherence management
Intelligent, automatic data movement
among heterogeneous cores, managed
by software-hardware partnership
hardware-based state adaptation
and software-hardware partnership
for management
heterogeneous cores, vector extensions,
and GPu-like techniques to reduce
instruction- and data-movement cost
Deeper, explicit storage hierarchy within
the core; integrated computation in
registers
hardware dynamic voltage scaling
and intelligent adaptive management,
software core selection and scheduling
Predictive core scheduling and selection
to optimize energy efficiency and
minimize data movement
Increasing variety, library-based
encapsulation (such as DX and OpenGl)
for specific domains
Converged accelerators in a few
application categories and increasing
open programmability for the
accelerators
software
transparency
explicit partition and mapping,
virtualization, application management
lower-power
cores
energy
management
Accelerator
variety
MAy 2011 | vOl. 54 | nO. 5 | CommunICatIons of the aCm 73