partitioned
hash-join
Two−pass radix-cluster
(001)
57
57
0
(001)
17
17
000
0
(011)03 81
( 111)
47
96 001
1
00
( 100)
92
75
(001)81 03
01
0
( 100)
20
66
1
( 110)06 92
10
(000)
96 20
11
( 101)
37
37
(010)
66
47
(001)
75 06
L
111
1
Black tuples hit (lowest 3-bits of values in parenthesis)
96
57
1
17
81
75
66 010
03
92
011
0
200
100
1
37
0
06
1
47
Two−pass radix-cluster
96
17 03(011)
32
96
17 (001)
17
32
66(010)
00
10
66
96(000)
2
2
2(010)
0
66
10
32(000)
01
03 03
35 (011)
35
35
47 ( 111)
0
1
10
200
20
20( 100)
11
0
47
47
10 (010)
R
1
or L2), cache thrashing occurs, causing the number of cache
misses to explode.
4. 2. Radix-cluster
Our Radix-Cluster algorithm2 divides a relation U into h clus-
ters using multiple passes (see Figure 3). Radix-clustering
on the lower B bits of the integer hash-value of a column is
achieved in P sequential passes, in which each pass clusters
p
tuples on Bp bits, starting with the leftmost bits (∑
1 Bp = B).
The number of clusters created by the Radix-Cluster is h =
Π P h , where each pass subdivides each cluster into h = 2Bp
1p p
new ones. When the algorithm starts, the entire relation is
considered one single cluster, and is subdivided into h = 2B1
1
clusters. The next pass takes these clusters and subdivides
each into h = 2B2 new ones, yielding h h clusters in total,
2
12
etc. With P = 1, Radix-Cluster behaves like the straightfor-
ward algorithm.
The crucial property of the Radix-Cluster is that the num-
ber of randomly accessed regions h can be kept low; while
x
still a high overall number of h clusters can be achieved
using multiple passes. More specifically, if we keep h = 2Bx
x
smaller than the number of cache lines and the number of
TLB entries, we completely avoid both TLB and cache thrashing. After Radix-Clustering a column on B bits, all tuples that
have the same B lowest bits in its column hash-value, appear
consecutively in the relation, typically forming clusters of
|U|/2B tuples (with |U| denoting the cardinality of the entire
relation).
Figure 3 sketches a Partitioned Hash-Join of two integer-based relations L and R that uses two-pass Radix-Cluster to
create eight clusters—the corresponding clusters are subsequently joined with Hash-Join. The first pass uses the two
leftmost of the lower three bits to create four partitions. In
the second pass, each of these partitions is subdivided into
two partitions using the remaining bit.
For ease of presentation, we did not apply a hash-function
in Figure 3. In practice, though, a hash-function should even
be used on integer values to ensure that all bits of the join attribute play a role in the lower B bits used for clustering. Note
that our surrogate numbers (oids) that stem from a dense
integer domain starting at 0 have the property that the lowermost bits are the only relevant bits. Therefore, hashing is not
required for such columns, and additionally, a Radix-Cluster
on all log(N) relevant bits (where N is the maximum oid from
the used domain) equals the well-known radix-sort algorithm.
experiments. Figure 4 show experimental results for a Radix-Cluster powered Partitioned Hash-Join between two memory resident tables of 8 million tuples on an Athlon PC (see
Manegold13). We used CPU counters to get a breakdo wn of cost
between pure CPU work, TLB, L1, and L2 misses. The vertical
axis shows time, while the horizontal axis varies the number
of radix-bits B used for clustering (thus it is logarithmic scale
with respect to the number of clusters h). Figure 4(a) shows
that if a normal Hash-Join is used (B = 0), running time is more
figure 4: execution time breakdown of individual join phases and overall join performance.
(a)
30
(b)
18
TLB1+ 2
L1
L2 CPU
16
8
TLB1+ 2
L1
L2
CPU
T1 T2 L1 L2
(c)
12
8
(d)
64M
35
25
T2 L2
T1 L1
12
7
P = 1
14
10
TLB1+ 2
L1
L2
CPU
7
P= 2 P= 3
6
Cluster size [bytes]
2M 64k 2k 64
T2 L2
T1 L1
10
6
20
Seconds
15
10
12
Clocks (in billions)
Seconds
8
10
86
6
Seconds
Clocks (in billions)
58
4
6
3
4
4
2
4
5
2
2
2
1
Clocks (in billions)
Seconds (points: measured, lines: modeled)
30
5
25
4
20
3
15
22
10
15
0
0 5 10 15 20
Number of radix-bits
Partitioned Hash-Join
0
00
0 5 10 15 20
Number of radix-bits
single-pass Radix-Cluster
00
0 5 10 15 20
Number of radix-bits
multi-pass Radix-Cluster
Simple
1 pass
2 passes
3 passes
minimum
0
0 5 10 15 20
Number of radix-bits
Radix-Cluster + PartHJoin