practice
Doi: 10.1145/1610252.1610270
Article development led by
queue.acm.org
How do we develop software to make
the most of the promise that asymmetric
multicore systems use a lot less energy?
BY aLExanDRa FEDoRoVa, Juan CaRLoS SaEz,
DaniEL ShELEPoV, anD manuEL PRiEto
maximizing
Power
Efficiency
with asymmetric
multicore Systems
in CoMPutinG SySteMS, a CPU is usually one of the
largest consumers of energy. For this reason reducing
CPU power consumption has been a hot topic in
the past few years in both the academic community
and the industry. In the quest to create more power-efficient CPUs, several researchers proposed an
asymmetric multicore architecture that promises to
save a significant amount of power while delivering
similar performance as conventional symmetric
multicore processors.
An asymmetric multicore processor (AMP)
consists of cores that use the same instruction set
architecture (ISA) but deliver different performance
and have different power characteristics. Using the same ISA on all cores
means running the same binary on all
cores without the need to compile code
with a different compiler for each core
type. This stands in contrast to heterogeneous-ISA systems, such as IBM’s
Cell or Intel’s Larrabee, where the cores
expose different ISAs, so the code must
be compiled separately for each core
type. Heterogeneous-ISA systems are
not the focus of this article.
A typical AMP consists of several fast
and powerful cores (high clock frequency, complex out-of-order pipeline, and
high power consumption) and a large
number of slower low-power cores (low
clock frequency, simple pipeline, and
low power consumption). Complex and
powerful cores are good for running
single-threaded sequential applications
because these applications cannot accelerate their performance by spreading the computation across multiple
simple cores. Abundant simple cores,
on the other hand, are good for running
highly scalable parallel applications.
Because of performance/power
trade-offs between complex and simple
cores, it turns out to be much more efficient to run a parallel application on
a large number of simple cores than
on a smaller number of complex cores
that consume the same power or fit
into the same area. In a similar vein,
complex and powerful cores are good
for running CPU-intensive applications
that effectively use those processors’
advanced microarchitectural features,
such as out-of-order super-scalar pipelines, advanced branch prediction facilities, and replicated functional units.
At the same time, simple and slow
cores deliver a better trade-off between
energy consumption and performance
for memory-intensive applications that
spend a majority of their execution time
fetching data from off-chip memory
and stalling the processor.
A symmetric multicore processor
(SMP) includes the cores of only one
type: either the complex and powerful ones, as in the Intel Xeon or AMD
Opteron processors, or the simple and