Looking Back and Looking
Forward: Power, Performance,
The past 10 years have delivered two significant revolutions. ( 1) Microprocessor design has been transformed by
the limits of chip power, wire latency, and Dennard scaling—leading to multicore processors and heterogeneity. ( 2)
Managed languages and an entirely new software landscape
emerged—revolutionizing how software is deployed, is sold,
and interacts with hardware. Researchers most often examine these changes in isolation. Architects mostly grapple
with microarchitecture design through the narrow software
context of native sequential SPEC CPU benchmarks, while
language researchers mostly consider microarchitecture in
terms of performance alone. This work explores the clash
of these two revolutions over the past decade by measuring power, performance, energy, and scaling, and considers
what the results may mean for the future. Our diverse findings include the following: (a) native sequential workloads
do not approximate managed workloads or even native
parallel workloads; (b) diverse application power profiles
suggest that future applications and system software will
need to participate in power optimization and management;
and (c) software and hardware researchers need access to
real measurements to optimize for power and energy.
Quantitative performance analysis is the foundation for
computer system design and innovation. In their classic
paper, Emer and Clark noted that “A lack of detailed timing information impairs efforts to improve performance.”
They pioneered the quantitative approach by characterizing
instruction mix and cycles per instruction on time-sharing
workloads. They surprised expert reviewers by demonstrating a
gap between the theoretical 1 MIPS peak of the VAX-11/780
and the 0.5 MIPS it delivered on real workloads. Industry and
academic researchers in software and hardware all use and
extend this principled performance analysis methodology.
Our research applies this quantitative approach to measured
power. This work is timely because the past decade heralded
the era of power- and energy-constrained hardware design.a
Furthermore, demand for energy efficiency has intensified
in large-scale systems, in which energy began to dominate
costs, and in mobile systems, which are limited by battery
life. A lack of detailed energy measurements is impairing
efforts to reduce energy consumption on modern workloads.
a Energy = power × execution time.
Society has benefited enormously from exponential
hardware performance improvements. Moore observed that
transistors will be smaller and more numerous in each new
15 For a long time, this simple rule of integrated
circuit fabrication came with an exponential and transparent performance dividend. Shrinking a transistor lowers
its gate delay, which raises the processor’s theoretical clock
speed (Dennard scaling3). Until recently, shrinking transistors delivered corresponding clock speed increases and more
transistors in the same chip area. Architects used the transistor bounty to add memory, prefetching, branch prediction,
multiple instruction issue, and deeper pipelines. The result
was exponential single-threaded performance improvements.
Unfortunately, physical power and wire-delay limits
will derail the clock speed bounty of Moore’s law in current and future technologies. Power is now a first-order
hardware design constraint in all market segments.
Power constraints now severely limit clock scaling and
prevent using all transistors simultaneously.
6, 8, 16 In addition, the physical limitations of wires prevent single cycle
access to a growing number of the transistors on a chip.
To effectively use more transistors at smaller technologies,
these limits forced manufacturers to turn to chip multiprocessors (CMPs) and recently to heterogeneous parallel
systems that seek power efficiency through specialization. Parallel heterogeneous hardware requires parallel
software and exposes software developers to ongoing
hardware upheaval. Unfortunately, most software today
is not parallel, nor is it designed to modularly decompose
onto a heterogeneous substrate.
Moore’s transistor bounty also drove orthogonal and
disruptive changes in how software is deployed, is sold, and
interacts with hardware over this same decade. Demands for
correctness, complexity management, programmer pro-
ductivity, time-to-market, reliability, security, and portability
pushed developers away from low-level compiled ahead-of-time
(native) programming languages. Developers increasingly
Original version: “Looking Back on the Language and Hardware
Revolutions: Measured Power, Performance, and Scaling,” ACM
Conference on Architecture Support for Programming
Languages and Operating Systems, pp. 319–332, Newport
Beach, CA, 2011. Also published as “What Is Happening to
Power, Performance and Software?,” IEEE Micro Top Picks
from the Computer Architecture Conferences of 2011,
May/June 2012, IEEE Computer Society.