uler on an AMD Opteron system featuring eight cores—four per memory
domain. The results are shown in Figure 8. The performance improvement
relative to default has been computed
as the average improvement for all applications in the workload (since not
all applications are memory intensive,
some do not improve). We can see that
DIO renders workload-average performance improvements of up to 11%.
Another potential use of DIO is as a
way to ensure QoS (quality of service)
for critical applications since DIO es-
sentially provides a means to make sure
the worst scheduling assignment is
never selected, while the default sched-
uler may occasionally suffer as a con-
sequence of a bad thread placement.
Figure 9 shows for each of the applica-
tions as part of the eight test workloads
its worst-case performance under DIO
relative to its worst-case performance
under the default Linux scheduler. The
figure 9. Performance of eight workloads under Dio relative to the default Linux scheduler.
improvement over Default
12
10
8
6
4
2
0
WL1
WL2
WL3
WL4 WL5
Workloads
WL6
WL7
WL8
figure 10. Worst-case performance for each of the applications included as part
of the eight test workloads.
GAMeSS
nAMd
improvement over Default
90
80
70
60
50
40
20
10
0
– 10
SPhInX
SOPleX
WL1
SOPleX
MCF
GAMeSS
GOBMK
WL2
MCF
POvrAy
GAMeSS
lIBQ
WL3
nAMd
h264reF
MCF
OMneTPP
WL4
MIlC
lIBQ
POvrAy
PerlB
WL5
SPhInX
GCC
nAMd
GAMeSS
WL6
lBM
MIlC
SPhInX
GOBMK
WL7
lBM
MIlC
MCF
nAMd
WL8
figure 11. Percentage reduction in eDP.
Spread Power Di
40
30
Better than cluster
20
10
0
– 10
– 20
– 30
– 40
– 50
– 60
0%
13% 25% 38% 50% 63% 75%
fraction of memory-intensive Apps in the Workload
88%
100%
numbers are shown in terms of the percentage of improvement or the worst-case behavior achieved under DIO
relative to that encountered with the
default Linux scheduler, so higher bars
in this case are better. We can see that
some applications are as much as 60%
to 80% better off with their worst-case
DIO execution times, and in no case
did DIO do significantly worse than the
default scheduler.
Power Distributed Intensity. One of
the most effective ways to conserve CPU
power consumption is to turn off unused cores or entire memory domains
in an active system. Similarly, if the
workload is running on multiple machines—for example, in a data center—
power savings can be accomplished by
clustering the workload on as few servers as possible while powering down
the rest. This seemingly simple solution is a double-edged sword, however,
because clustering the applications on
just a few systems may cause them to
compete for shared system resources
and thus suffer performance loss. As
a result, more time will be needed to
complete the workload, meaning that
more energy will be consumed. In an
attempt to save power it is also necessary to consider the impact that clustering can have on performance. A metric
that takes into account both the energy
consumption and the performance
of the workload is the energy-delay
product (EDP). 4 Based on our findings
about contention-aware scheduling,
we designed Power DI, a scheduling algorithm meant to save power without
hurting performance.
Power DI works as follows: Assuming a centralized scheduler has knowledge of the entire computing infra-structure and distributes incoming
applications across all systems, Power
DI clusters all incoming applications
on as few machines as possible, except
for those applications deemed to be
memory intensive. Similarly, within a
single machine, Power DI clusters applications on as few memory domains
as possible, with the exception of memory-intensive applications. These applications are not co-scheduled on the
same memory domain with another application unless the other application
has a very low cache miss rate (and thus
a low memory intensity). To determine
if an application is memory-intensive,