Figure 7. CmP: Comparing two cores to one core. (a) impact of
doubling the number of cores on performance, power, and energy,
averaged over all four workloads. (b) energy impact of doubling
the number of cores for each workload. Doubling the cores is not
consistently energy efficient among processors or workloads.
1. 60
1. 50
1. 40
2 Cores/1 Core
1. 30
1. 20
1. 10
1.00
0.90
0.80
0.70
0.60
Figure 8. scalability of single-threaded Java benchmarks.
Counterintuitively, some single-threaded Java benchmarks scale
well. this is because the underlying JVm exploits parallelism for
compilation, profiling, and garbage collection.
Performance
Power
Energy
2 Cores/1 Core
0.90 1.00 1. 10 1. 20 1. 30 1. 40 1. 50 1. 60
a n tlr
luin d e x
db
blo at
je ss
co m press
m pegaudio
javac
fo p
jack
1. 20
1. 10
2 Cores/1 Core
1.00
0.90
0.80
0.70
0.60
Native
non-scale
Native
scale
i7 ( 45)
Java
scale
i7 ( 45), translating to energy overheads.
More interesting is that Java non-scalable does not incur
energy overhead when enabling another core on the i5 ( 32).
In fact, we were surprised to find that the reason for this is
that the single-threaded Java non-scalable workload runs
faster with two processors! Figure 8 shows the scalability
of the single-threaded subset of Java non-scalable on the
i7 ( 45), with SMT disabled, comparing one and two cores.
Although these Java benchmarks are single-threaded, the
JVMs on which they execute are not.
Figure 9. smt: one core with and without smt. (a) impact of enabling
two-way smt on a single-core with respect to performance, power, and
energy, averaged over all four workloads. (b) energy impact of enabling
two-way smt on a single core for each workload. enabling smt delivers
significant energy savings on the recent i5 ( 32) and the in-order atom ( 45).
1. 60
1. 50
2 Threads/1 Thread
1. 40
1. 30
1. 20
1. 10
1.00
0.90
0.80
0.70
0.60
Performance
Pentium 4 (130)
Power Energy
(a)
i7 ( 45) Atom ( 45) i5 ( 32)
finding: The JVM induces parallelism into the execution of single-threaded Java benchmarks.
2 Threads/1 Thread
0.60 . 70. 80. 90 1.00
1. 10
1. 20
Native
non-scale
Native
scale
Java
non-scale
Java
scale
(b)
Pentium 4 (130) i7 ( 45) Atom ( 45) i5 ( 32)
Since virtual machine runtime services for managed languages, such as just-in-time (JIT) compilation, profiling, and
garbage collection, are often concurrent and parallel, they
provide substantial scope for parallelization, even within
ostensibly sequential applications. We instrumented the
HotSpot JVM and found that its JIT compilation and garbage collection are parallel. Detailed performance counter
measurements revealed that the garbage collector induced
memory system improvements with more cores by reducing
the collector’s displacement effect on the application thread.
5. 2. simultaneous multithreading
Figure 9 shows the effect of disabling simultaneous multi-
threading (SMT) 19 on the Pentium 4 (130), Atom ( 45), i5 ( 32),
and i7 ( 45). Each processor supports two-way SMT. SMT pro-
vides fine-grain parallelism to distinct threads in the proces-
sors’ issue logic and in modern implementations; threads
share all processor components (e.g., execution units and
caches). Singhal states that the small amount of logic exclu-
sive to SMT consumes very little power. 18 Nonetheless, this
logic is integrated, so SMT contributes a small amount to
total power even when disabled. Our results therefore slightly
underestimate the power cost of SMT. We use only one core,