workload and we offer no commentary on the virtue of language choice. We create four workloads from 61 benchmarks.
Native non-scalable: C, C++, and Fortran single-threaded
compute-intensive benchmarks from SPEC CPU2006.
Native scalable: Multithreaded C and C++ benchmarks from
Java non-scalable: Single and multithreaded benchmarks
that do not scale well from SPECjvm, DaCapo 06-10-MR2,
DaCapo 9. 12, and pjbb2005.
Java scalable: Multithreaded Java benchmarks from DaCapo
9. 12 that scale in performance similarly to native scalable
on the i7 ( 45).
We execute the Java benchmarks on the Oracle HotSpot 1. 6.0
virtual machine because it is a mature high-performance
virtual machine. The virtual machine dynamically optimizes
each benchmark on each architecture. We use best practices for
virtual machine measurement of steady state performance.
We compile the native non-scalable workload with icc at –o3.
We use gcc at –o3 for the native scalable workload because
icc did not correctly compile all benchmarks. The icc compiler
generates better performing code than gcc. We execute the
same native binaries on all machines. All the parallel native
benchmarks scale up to eight hardware contexts. The Java scalable workload is the subset of Java benchmarks that scale well.
3. 2. hardware
Table 1 lists our eight Intel IA32 processors which cover four
process technologies (130nm, 65nm, 45nm, and 32nm)
and four microarchitectures (NetBurst, Core, Bonnell, and
Nehalem). The release price and date give context regarding
Intel’s market placement. The Atoms and the Core 2Q ( 65)
Kentsfield are extreme market points. These processors are
only examples of many processors in each family. For example,
Intel sells over 60 Nehalems at 45 nm, ranging in price from
around $190 to over $3700. We believe that these samples are
representative because they were sold at similar price points.
To explore the influence of architectural features, we selectively down-clock the processors, disable cores on these chip
multiprocessors (CMP), disable simultaneous multithreading (SMT), and disable Turbo Boost using BIOS configuration.
3. 3. Power, performance, and energy measurement
We isolate the direct current (DC) power supply to the processor on the motherboard and measure its current with
Pololu’s ACS714 current sensor board. The supply voltage
is very stable, varying by less than 1%, which enables us to
correctly calculate power. Prior work used a clamp ammeter, which can only measure alternating current (AC), and is
therefore limited to measuring the whole system.
10, 12 After
publishing the original paper, Intel made chip-level and
core-level energy measurements available on Sandy Bridge
4 Our methodology should slightly overstate chip
power because it includes losses due to the motherboard’s
voltage regulator. Validating against the Sandy Bridge energy
counter shows that our power measurements consistently
measure about 5% more current.
We execute each benchmark multiple times on every
architecture, log its power values, and then compute average
power consumption. The aggregate 95% confidence intervals of execution time and power range from 0.7% to 4%. The
measurement error in time and power for all processors and
benchmarks is low. We compute arithmetic means over the
four workloads, weighting each workload equally. To avoid
biasing performance measurements to any one architecture, we compute a reference performance for each benchmark by averaging the execution time on four architectures:
Pentium 4 (130), Core 2D ( 65), Atom ( 45), and i5 ( 32). These
choices capture four microarchitectures and four technology generations. We also normalize energy to a reference,
since energy = power × time. The reference energy is the average benchmark power on the four processors multiplied by
their average execution time.
We measure the 45 processor configurations ( 8 stock
and 37 BIOS configurations) and produce power and
table 1. specifications for the eight processors used in the experiments.
Core 2 Duo
SL9S8 jul ‘06
nehalem Bloomfield SLBCH
Bonnell Diamondville SLB6Z
nehalem Clarkdale SLBLT
smt LLC (B)
May‘03 – 1C2T 512K 2. 4 130 55 131 – 66 800 – DDR-
316 2C1T 4M 2. 4 65 291 143 65 1066 – DDR2-
jan‘07 851 4C1T 8M 2. 4 65 582 286 105 1066 – DDR2-
nov‘08 284 4C2T 8M 2. 7 45 731 263 130 – 25. 6 DDR3-
jun‘08 29 1C2T 512K 1. 7 45 47 26 4 533 – DDR2-
May‘09 133 2C1T 3M 3. 1 45 228 65 1066 – DDR2-
Dec‘09 63 2C2T 1M 1. 7 45 176 13 665 – DDR2-
jan‘ 10 284 2C2T 4M 3. 4 32 382 73 – 21.0 DDR3-