real-world workloads, STAMP2 offers
a range of workloads and is widely
used to evaluate TM systems. STAMP
applications can be configured with
different parameters defining differ-
ent workloads. In our experiments, we
used 10 workloads from the STAMP
0.9.10 distribution, including low-
and high-contention workloads for
kmeans and vacation applications
and one workload for all other applica-
tions. The exact workload settings we
used are specified in the companion
technical report6; and
It is important to note that these
benchmarks are TM benchmarks,
most using transactions extensively
(with the exception of labyrinth and
to a lesser extent genome and yada).
Applications that would use transac-
tions to simplify synchronization, but
in which only a small fraction of execu-
tion time would be spent in transac-
tions, would benefit from STM more
than the benchmarks we used. In a
sense, the benchmarks we used rep-
resent a worst-case scenario for STM
usage. For each experiment, we com-
puted averages from at least five runs.
less contention the workload exhibits, the more benefit is expected from
STM; for example, STM outperforms
sequential code by more than 11 times
on a read-dominated workload of ST-
MBench7 and less than two times for
a write-dominated workload of the
same benchmark.
On x86 (see Figure 1b), STM-ME
outperforms sequential code on 13
workloads with four threads. Over-
all, STM clearly outperforms sequen-
tial code on all workloads, except on
the challenging, high-contention STM-
Bench7 write workload. The perfor-
mance gain, compared to sequential
code, is lower than on SPARC (up to
nine times speedup on x86 compared
to 29 times on SPARC) for two reasons:
All threads execute on the same chip
with SPARC, so the inter-thread com-
munication costs less, and sequential
performance of a single thread on
SPARC is much lower.
Figure 1. stm-me performance.
(1a) sPaRC
1 2 4 8 16 32 64
17
24
20
29
16
14
12
10
speedup
8
6
4
2
0
sB7 Read/ Write
sB7 Write
Bayer
Genome
intruder
Kmeans high
Vacation high
Vacation Low
Kmeans Low
Linked List
hashtable
Labyrinth
skiplist
Rbtree
ssca2
yada
sB7 Read
(1b) x86
1 2 4 8 16
stm-me Performance
Figure 1a outlines STM-ME (manual
instrumentation with explicit privatization) speedup over sequential,
noninstrumented code on SPARC,
showing that STM-ME delivers good
performance with a small number of
threads, outperforming sequential
code on 14 of 17 workloads with four
threads. The figure also outlines that
STM outperforms sequential code on
all benchmarks we used by up to 29
times on the vacation low benchmark. The experiment shows that the
10
9
8
7
speedup
6
5
4
3
2
1
0
sB7 Read/ Write
sB7 Write
Bayer
Genome
intruder
Kmeans high
Kmeans Low
Labyrinth
ssca2
Vacation high
Vacation Low
yada
hashtable
Linked List
Rbtree
skiplist
sB7 Read