with r ≥ |R| we do not require that each data item is hit at
least once.
An interleaved access nest(R, m, , O[, D]) models a nested
multicursor access pattern where R is divided into m (
equal-sized) subregions. Each subregion has its own local cursor. R1
All local cursors perform the same basic pattern . O specifies, whether the global cursor picks the local cursors ran-
R domly (O = ran) or sequentially (O = seq). In the latter case,
2
D specifies, whether all traversals of the global cursor across
the local cursors use the same direction (D = uni), or whether
subsequent traversals use alternating directions (D = bi). Rm
Figure 7 shows an example.
figure 7: interleaved multicursor access: nest(R, m, s_trav(R),
seq, bi).
. . .
1
2
3
. . .
.
.
.
k
. . .
5. 1. 2. Compound Access Patterns
Database operations access more than one data region, e.g.,
their input(s) and their output, which leads to compound
data access patterns. We use , , and = ∪ ( ∩
bc b c b
= 0/ ) to denote the set of basic access patterns, compound
c
access patterns, and all access patterns, respectively.
Be ,…, ∈ (p > 1) data access patterns. There are only
1p
two principle ways to combine patterns. They are executed
either sequentially ( : × → ) or concurrently ( : ×
→ ). We can apply and repeatedly to describe more
complex patterns.
Table 2 illustrates compound access patterns of some
typical database algorithms. For convenience, reoccurring
compound access patterns are assigned a new name.
5. 2. cost functions
For each basic pattern, we estimate both sequential and ran-
dom cache misses for each cache level i ∈ { 1, .. ., N}. Given
an access pattern ∈ , we describe the number of misses
per cache level as a pair
r
Mi () = Ms i (), Mr i () ∈ N × N
containing the number of sequential and random cache
misses. The detailed cost functions for all basic patterns
introduced above can be found in Manegold.
11, 12
Local cursors
Global cursor
The major challenge with compound patterns is to
model cache interference and dependencies among basic
patterns.
5. 2. 1. Sequential Execution
When executed sequentially, patterns do not interfere.
Consequently, the resulting total number of cache misses
is at most the sum of the cache misses of all patterns.
However, if two subsequent patterns operate on the same
data region, the second might benefit from the data that
the first one leaves in the cache. It depends on the cache
size, the data sizes, and the characteristics of the patterns,
how many cache misses may be saved this way. To model
this effect, we consider the contents or state of the caches,
described by a set s of pairs 〈R, r〉 ∈ × [0, 1], stating for
each data region R the fraction r that is available in the
cache.
In Manegold11, 12 we discuss how to calculate (i) the
cache misses of a basic pattern ∈ given a cache state
qb
s q− 1 as
rr
M (Sq−
1, ) = ′ (Sq−
1,M ( )),
i i q i iq
and (ii) the resulting cache state after executing as
q
Sq (Sq−
1, ) = ′′(Sq−
1, ).
iiq iq
table 2: sample data access patterns (U, V, V', W OE )
algorithm Pattern Description name
W ← select(U) s_trav(U) s_trav( W)
W ← nested_loop_ join(U, V) s_trav(U) rs_trav(|U|, uni, V) s_trav( W) =: nl_join(U, V, W)
W ← zig_zag_ join(U, V) s_trav(U) rs_trav(|U|, bi, V) s_trav( W)
V' ← hash_build(V) s_trav(V) r_trav(V’) =: build_hash(V, V')
W ← hash_probe(U, V') s_trav(U) r_acc(|U|, V’) s_trav( W) =: probe_hash(U, V’, W)
W ← hash_ join(U, V) build_hash(V, V') probe_hash(U, V', W) =: h_joins(U, V, W)
{U }|m ← cluster(U, m) s_trav(U) nest({U }|m , m, s_trav(U ),ran) =: part(U, m, {U }|m )
j j= 1 j j= 1 j j j= 1
W ← part_nl_ join(U, V, m) part(U, m, {U }|m ) part(V, m, {V }|m ) nl_join(U , V , W ) . . . nl_join(U , V , W )
j j=
1 jj= 1 1 1 1 m m m
W ← part_h_ join(U, V, m) part(U, m, {U }|m ) part(V, m, {V }|m ) nl_join(U , V , W ) . . . h_join(U , V , W )
j j=
1 jj= 1 1 1 1 m m m