(SPEC) CPU 2006 benchmark suite6—
run simultaneously on an Intel Quad-Core Xeon system similar to the one
depicted in Figure 1.
As a test, we ran this group of appli-
cations several times, in three different
schedules, each time with two different
pairings sharing a memory domain.
The three pairing permutations af-
forded each application an opportu-
nity to run with each of the other three
applications within the same memory
Soplex and Sphinx ran in a mem- ˲
ory domain, while Gamess and Namd
shared another memory domain.
figure 1. A schematic view of a multicore system with two memory domains representing
the architecture of intel Quad-core xeon processors.
figure 2. Percentage of performance degradation over a solo run achieved in
two different scheduling assignments: the best and the worst. the lower the bar,
the better the performance.
Degradation over solo
percentage of degradation from solo
execution time (when the application
ran alone on the system), meaning that
the lower the numbers, the better the
There is a dramatic difference between the best and the worst schedules,
as shown in the figure. The workload as
a whole performed 20% better with the
best schedule, while gains for individual applications Soplex and Sphinx were
as great as 50%. This indicates a clear
incentive for assigning applications to
cores according to the best possible
schedule. While a contention-oblivi-ous scheduler might accidentally happen upon the best schedule, it could
just as well run the worst schedule. A
contention-aware schedule, on the other hand, would be better positioned to
choose a schedule that performs well.
This article describes an investigation of a thread scheduler that would
mitigate resource contention on multicore processors. Although we began
this investigation using an analytical
modeling approach that would be difficult to implement online, we ultimately arrived at a scheduling method
that can be easily implemented online
with a modern operating system or
even prototyped at the user level. To
share a complete understanding of the
problem, we describe both the offline
and online modeling approaches. The
article concludes with some actual performance data that shows the impact
contention-aware scheduling techniques can have on the performance
of applications running on currently
available multicore systems.
To make this study tractable we
made the assumption that the threads
do not share any data (that is, they belong either to different applications
or to the same application where each
thread works on its own data set). If
threads share data, they may actually
figure 3. Mcf (a) is an application with a rather poor temporal locality, hence the low reuse frequency and high miss frequency.
Povray (b) has excellent temporal locality. Milc (c) rarely reuses its data, therefore showing a very low reuse frequency and
a very high miss frequency.