FIGURE
7Percentage of Time Spent in STM End Sub-Operations
100
90
80
percent of cycles (norm. to fv)
70
60
50
40
30
20
return
cleanup transactional state
release
metadata
increment gv#
write data
validate
sync
acquire
metadata
check for
read-only
setup
call
other
performance by 2 to 15
percent. We believe such
analysis is a good practice
that should be extended
to every piece of system
software, especially open
source. However, the gains
are only a minor dent in
the overheads we observed,
indicating the challenge
that lies before the community in making STM
performance compelling.
10
0
fv gv#
b+tree
fv gv#
delaunay
fv gv#
kmeans
fv gv#
genome
fv gv#
vacation
STMs that use a global version number. 49 JudoSTM50 and
RingSTM51 reduce the number of atomic operations that
must be performed when committing a transaction at
the cost of serializing commit and/or incurring spurious
aborts because of imprecise conflict detection. Several
proposals have been made for STM systems that operate
via dynamic binary rewriting in order to allow the usage
of STM on legacy binaries. 52, 53, 54
Yoo et al.55 analyze the overhead in the execution
of Intel’s STM. 56, 57 They identify four major sources of
overhead: over-instrumentation, false sharing, amortization costs, and privatization-safety costs. False sharing, privatization-safety, and over-instrumentation are
implementation artifacts that can be eliminated by using
either finer-granularity bookkeeping, more refined analysis, or user annotations. Amortization costs are inherent
overheads in STM that, as we demonstrated here, are not
likely to be eliminated.
A large amount of research effort has been spent in
analyzing the operations in TM systems. Recent software optimizations have managed to accelerate STM
CONCLUSION
Based on our results, we
believe that the road ahead
for STM is quite challenging. Lowering the overhead
of STM to a point where it
is generally appealing is a
difficult task, and significantly better results have
to be demonstrated. If we
could stress a single direction for further research,
it is the elimination of
dynamically unnecessary
read and write barriers—
possibly the single most powerful lever toward further
reduction of STM overheads. Given the difficulty of similar problems explored by the research community such as
alias analysis, escape analysis, and so on, this may be an
uphill battle. Because the argument for TM hinges upon
its simplicity and productivity benefits, we are deeply
skeptical of any proposed solutions to performance problems that require extra work by the programmer.
We observed that the TM programming model itself,
whether implemented in hardware or software, introduces complexities that limit the expected productivity
gains, thus reducing the current incentive for migration
to transactional programming and the justification at
present for anything more than a small amount of hardware support. Q
ACKNOWLEDGMENTS
We would like to thank Pratap Pattnaik for his continuous support, Christoph von Praun for numerous discussions and work on benchmarks and runtimes, and Rajesh
Bordawekar for the b+tree code implementation.