of transactions. Also, the measured
performance costs are specific to our
choice of privatization technique and
implementation; ways to reduce privatization costs have been proposed.
16, 17
Figure 3a shows performance of
S TM-MT (manual instrumentation with
transparent privatization) with SPARC,
conveying that transparent privatization affects the performance of STM
significantly but that STM-MT still performs well, outperforming sequential
code on 11 of 17 workloads with four
threads and on 13 workloads with eight
threads. Also, STM-MT outperforms sequential code on all benchmarks. However, performance is not as good; STM-MT outperforms sequential code by up
to 23 times compared to 29 times with
STM-ME and by 5. 6 times on average
compared to 9. 1 times with STM-ME.
Our experiments show performance
for some workloads (such as ssca2) is
unaffected but also that privatization
costs can be as high as 80% (such as in
vacation low and yada). Also, in general, costs increase with the number of
concurrent threads, affecting both performance and scalability of STM. Table
3 summarizes the costs of transparent
privatization with SPARC.
We repeated the experiments with
the x86 machine (see Figure 3b), with results confirming that STM-MT has lower performance than STM-ME. STM-MT
outperforms sequential code on eight
of 17 workloads with four threads and
on 14 workloads with eight threads.
Overall, transparent privatization
overhead reduces STM performance
below performance of sequential code
in three benchmarks: STMBench7
read/write, STMBench7 write, and
kmeans high. Note that performance
is affected most with microbenchmarks
due to cache contention for shared
privatization metadata induced by
small transactions.
Our experiments show that privatization costs can be as high as 80% and
confirm that transparent privatization costs increase with the number of
threads. The cost of transparent privatization is higher on our four-CPU x86
machine than on SPARC, due mainly to
the higher costs of interthread communication; Table 3 lists the costs of transparent privatization with x86.
While the effect of transparent priva-
tization can be significant, STM-MT
still scales and performs well on a range
of applications. We also conclude that
reducing costs of cache-coherence traf-
fic by having more cores on a single
chip reduces the cost of transparent
privatization, resulting in better perfor-
mance and scalability.
table 3. transparent privatization costs ( 1 — speedupstm-mt speedupstm-me).
threads
1
2
4
8
16
32
64
min
0
0.02
0.03
0.03
0
0
0
sPaRC
max
0.06
0.47
0.59
0.66
0.75
0.77
0.8
avg
0
0.16
0.26
0.32
0.35
0.34
0.35
min
0
0.03
0.06
0.08
0.17
—
—
x86
max
0.45
0.58
0.64
0.69
0.85
—
—
avg
0.08
0.29
0.4
0.48
0.51
—
—
Figure 3. stm-mt performance.
(3a) sPaRC
1 2 4 8 16 32 64
18
23
16
14
12
speedup
10
8
6
4
2
0
sB7 Read/ Write
sB7 Write
Bayer
Genome
intruder
Kmeans high
Vacation high
Vacation Low
Kmeans Low
Linked List
hashtable
Labyrinth
skiplist
Rbtree
ssca2
yada
sB7 Read
(3b) X86
1 2 4 8 16
6
5
speedup
4
3
2
1
0
sB7 Read/ Write
sB7 Write
Bayer
Genome
intruder
Kmeans high
Kmeans Low
Labyrinth
ssca2
Vacation high
Vacation Low
yada
hashtable
Linked List
Rbtree
skiplist
sB7 Read