silently from the private caches, introducing ambiguity at the shared cache
as to what is still being cached and what
has already been evicted. This lack of
information manifests as poor replacement decisions at the shared cache.
To remedy this lack of information,
a system can instead require the private caches to send explicit notification
messages whenever a block is evicted,
even when evicting clean blocks. For
example, AMD’s HT-Assist protocol
uses explicit eviction notifications on
clean-exclusive block replacements to
improve sharer state encoding. 6 If such
eviction notifications occur on every
cache eviction, the protocol enables
the shared cache to maintain precise
up-to-date tracking of private caches
that hold each block, transforming the
tracking information from conservative to exact. When an eviction decision does occur, the shared cache thus
knows which blocks are no longer being cached and likely have a choice to
evict a nonshared block to avoid a recall. However, this precision comes at a
cost in the form of increased traffic for
evictions of clean blocks, the overhead
of which was already included in the
traffic analysis.
Explicit eviction notifications can
potentially eliminate all recalls, but
only if the associativity, or number of
places in which a specific block may
be cached, of the shared cache exceeds the aggregate associativity of
the private caches. With sufficient associativity, whenever the shared cache
looks for a nonshared block to evict, if
it has exact sharing information, it is
guaranteed to find a nonshared block
and thus avoid a recall. Without this
worst-case associativity, a pathological
cluster of misses could lead to a situation in which all blocks in a set of the
shared cache are truly shared. Unfortunately, even with a modest number of
cores, the required associativity is prohibitive, as reported by Ferdman et al. 7
For example, eight cores with eight-way
set-associative private caches require a
64-way set-associative shared cache,
and the required associativity doubles
for each doubling of the number of
cores.
Rather than eliminate all recalls,
we focus on a system in which recalls
are possible but rare. To estimate the
effect of limited shared cache associa-
tivity on recall rate, we performed a
simulation modeling recalls due to en-
forcing inclusion in such a system. We
pessimistically configured the private
caches to be fully associative. To factor
out the effect of any particular bench-
mark, we generated a miss-address
stream to random sets of the shared
cache that prior work found accurately
approximates conflict rates. 9 We also
pessimistically assumed no data shar-
ing among the cores that would reduce
the inclusive capacity pressure on the
shared cache.
Figure 4 shows the recall rate, or percentage of misses that cause a recall,
for shared caches of various sizes (as
a ratio of aggregate per-core capacity)
for several shared cache associativities. When the capacity of the shared
cache is less than the aggregate per-core capacity (ratio < 1.0), almost every
request causes a recall, because the
private caches are constantly contending for an unrealistically underprovi-sioned shared cache. As the size of the
shared cache increases, the recall rate
drops quickly. When the capacity ratio
reaches four times, even an eight-way
set-associative shared cache keeps
Figure 3. storage overhead in shared caches.
Single-Level
Two-Level
Three-Level
Storage Overhead (percent)
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
1
24816
32 64 128 256 512 1024 2048 4096
Cores
Figure 4. Likelihood a shared cache miss triggers a recall.
1-way 2-way 4-way
Associativity of Shared Cache
8-way
Percentage of misses causing recalls
80
90
100
70
60
50
40
30
20
10
0
012345678
Ratio of aggregate private cache capacity to shared cache capacity
Expected
Design
Space
juLy 2012 | voL. 55 | no. 7 | CommuniCations oF the aCm 85