choose high-level managed programming languages with
a selection of safe pointer disciplines, garbage collection
(automatic memory management), extensive standard libraries, and dynamic just-in-time compilation for hardware portability. For example, modern Web services combine managed
the client side. In markets as diverse as financial software
and cell phone applications, Java and .NET are the dominant choices. The exponential performance improvements
provided by hardware hid many of the costs of high-level
languages and helped create a virtuous cycle with ever more
capable and high-level software. This ecosystem is resulting in an explosion of developers, software, and devices that
continue to change how we live and learn.
Unfortunately, a lack of power measurements is impairing efforts to reduce energy consumption on traditional and
2. oVeRVie W
Our work quantitatively examines power, performance,
and scaling during this period of disruptive software and
hardware changes (2003–2011). Voluminous research
explores performance analysis and a growing body of work
explores power (see Section 6), but our work is the first to
systematically measure the power, performance, and energy
characteristics of software and hardware across a range of
processors, technologies, and workloads.
We execute 61 diverse sequential and parallel benchmarks written in three native languages and one managed
language, all widely used: C, C++, Fortran, and Java. We
choose Java because it has mature virtual machine technology and substantial open source benchmarks. We choose
eight representative Intel IA32 processors from five technology generations (130 nm to 32 nm). Each processor has
an isolated processor power supply with stable voltage on
the motherboard, to which we attach a Hall effect sensor
that measures power supply current, and hence processor
power. We calibrate and validate our sensor data. We find
that power consumption varies widely among benchmarks.
Furthermore, relative performance, power, and energy are
not well predicted by core count, clock speed, or reported
Thermal Design Power (TDP). TDP is the nominal amount
of power the chip is designed to dissipate (i.e., without
exceeding the maximum transistor junction temperature).
Using controlled hardware configurations, we explore the
energy impact of hardware features and workload. We perform historical and Pareto analyses that identify the most
power- and performance-efficient designs in our architecture
configuration space. We make all of our data publicly available in the ACM Digital Library as a companion to our original ASPLOS 2011 paper. Our data quantifies a large number
of workload and hardware trends with precision and depth,
some known and many previously unreported. This paper
highlights eight findings, which we list in Figure 1. Two
themes emerge from our analysis: workload and architecture.
Workload. The power, performance, and energy trends of
native workloads substantially differ from managed and parallel native workloads. For example, (a) the SPEC CPU2006
native benchmarks draw significantly less power than
Figure 1. eight findings from an analysis of measured chip power,
performance, and energy on 61 workloads and eight processors.
the asPLos paper includes more findings and analysis.
Power consumption is highly application dependent and is poorly correlated
Power per transistor is relatively consistent within microarchitecture family,
independent of process technology.
energy-efficient architecture design is very sensitive to workload.
Configurations in the native non-scalable Pareto Frontier substantially differ
from all the other workloads.
Comparing one core to two, enabling a core is not consistently energy efficient.
The java virtual Machine induces parallelism into the execution of single-threaded java benchmarks.
Simultaneous multithreading delivers substantial energy savings for recent
hardware and for in-order processors.
Two recent die shrinks deliver similar and surprising reductions in energy,
even when controlling for clock frequency.
Controlling for technology, hardware parallelism, and clock speed, the out-of-order architectures have similar energy efficiency as the in-order ones.
parallel benchmarks and (b) managed runtimes exploit parallelism even when executing single-threaded applications.
The results recommend that systems researchers include
managed and native, sequential and parallel workloads
when designing and evaluating energy-efficient systems.
Architecture. Hardware features such as clock scaling,
gross microarchitecture, simultaneous multithreading, and
chip multiprocessors each elicit a huge variety of power,
performance, and energy responses. This variety and the
difficulty of obtaining power measurements recommend
exposing on-chip power meters and, when possible, power
meters for individual structures, such as cores and caches.
Modern processors include power management techniques
that monitor power sensors to minimize power usage and
boost performance. However, only in 2011 (after our original paper) did Intel first expose energy counters, in their
production Sandy Bridge processors. Just as hardware event
counters provide a quantitative grounding for performance
innovations, future architectures should include power and/or
energy meters to drive innovation in the power-constrained
computer systems era.
Measurement is key to understanding and optimization.
This section presents an overview of essential elements of
our methodology. We refer the reader to the original paper
for a more detailed treatment.
3. 1. software
We systematically explore workload selection and show that it
is a critical component for analyzing power and performance.
Native and managed applications embody different trade-offs
between performance, reliability, portability, and deployment.
It is impossible to meaningfully separate language from