successful at modeling contention
largely because they manage to capture
two important qualities related to contention: sensitivity and intensity. Sensitivity measures how much a thread suffers whenever it shares the cache with
other threads. Intensity, on the other
hand, measures how much a thread
hurts other threads whenever it shares
a cache with them. Measuring sensitivity and intensity appealed to us because
together they capture the key information contained within memory-reuse
profiles; we also had some ideas about
how they could be approximated using online performance data. Before
learning how to approximate sensitivity and intensity, however, we needed
to confirm that these were indeed good
bases for modeling cache contention
among threads. To accomplish that,
we formally derived the sensitivity and
intensity metrics based on data in the
memory-reuse profiles. After confirming that the metrics derived in this way
did indeed accurately model contention, we could then attempt to approximate them using just online data.
Accordingly, we derived sensitivity S
and intensity Z for an application using
data from its memory-reuse profile. To
compute S, we applied an aggregation
function to the reuse-frequency histo-
gram. Intuitively, the higher the reuse
frequency, the greater an application
is likely to suffer from the loss of cache
space due to contention with another
application—signifying a higher sen-
sitivity. To compute Z, we simply used
the cache-access rate, which can be in-
ferred from the memory-reuse profile.
Intuitively, the higher the access rate,
the higher the degree of competition
from the thread in question since the
high access rate shows that it allocates
new cache lines while retaining old
ones. Details for the derivation of S and
Z are described in another article. 10
Pain(A|B) = SA ZB
Pain(B|A) = SB ZA
Pain(A,B) – Pain(A|B) + Pain(B|A)
based on memory-reuse profiles, we
looked for a way to approximate it using
just the data available online via stan-
dard hardware performance counters.
This led us to explore two performance
metrics to approximate sensitivity and
intensity: the cache-miss rate and the
cache-access rate. Intuitively these met-
rics correlate with the reuse frequency
and the intensity of the application.
Our findings regarding which metric
offers the best approximation are sur-
prising, so to maintain some suspense,
we postpone their revelation until the
section entitled “Evaluation of Model-
ing Techniques.”
Intuitively, Pain(A|B) approximates
the performance degradation of A
when A runs with B relative to running
solo. It will not capture the absolute
degradation entirely accurately, but it
is good for approximating relative deg-
radations. For example, given two po-
tential neighbors for A, the Pain metric
can predict, which will cause a higher
performance degradation for A. This is
precisely the information a contention-
aware scheduler would require.
figure 4. computing the pain for all possible schedules. the schedule with the lowest Pain
is chosen as the estimated best schedule.
Schedule {(A,B), (C,D)}:
Schedule {(A,C), (B, D)}:
Schedule {(A,D), (B,C)}:
Pain = Average( Pain(A,B), Pain(C, d))
Pain = Average( Pain(A,C), Pain(B, d))
Pain = Average( Pain(A,d), Pain(B, C))
figure 5. the metric for comparing the actual best and the estimated best schedule.
Estimated Best Schedule {(A,B), (C,D)}:
DegradationEst = Average( degrad(A|B), degrad(B|A), degrad(C|d), degrad(d|C))
Actual Best Schedule {(A,C), (B, D)}:
DegradationAct = Average( degrad(A|C), degrad(C|A), degrad(B|d), degrad(d|B))
Degradation over actual best = (DegradationEst DegradationAct – 1) ´ 100%
using contention
models in a scheduler
In evaluating the new models for cache
contention, our goal was to determine
how effective the models would be for
constructing contention-free thread
schedules. We wanted the model to
help us find the best schedule and
avoid the worst one (recall Figure 2).
Therefore, we evaluated the models on
the merit of the schedules they managed to construct. With that in mind,
we describe here how the scheduler
uses the Pain metric to find the best
schedule.
To simplify the explanation for this
evaluation, we have a system with two
pairs of cores sharing the two caches
(as illustrated in Figure 1), but as mentioned earlier, the model also works
well with more cores per cache. In this
case, however, we want to find the best
schedule for four threads. The scheduler would construct all the possible
permutations of threads on this system, with each of the permutations being unique in terms of how the threads
are paired on each memory domain. If
we have four threads—A, B, C, and D—
there will be three unique schedules:
( 1) {(A,B), (C,D)}; ( 2) {(A,C), (B,D)}; and
( 3) {(A,D), (B,C)}. Notation (A,B) means
that threads A and B are co-scheduled
in the same memory domain. For each
schedule, the scheduler estimates the
Pain for each pair: in schedule {(A,B),
(C,D)} the scheduler would estimate
Pain(A,B) and Pain(C,D) using the
equations presented previously. Then
it averages the Pain values of the pairs
to estimate the Pain for the schedule
as a whole. The schedule with the lowest Pain is deemed to be the estimated