7Percentage of Time Spent in STM End Sub-Operations
100
90
80
percent of cycles (norm. to fv)
70
60
50
40
30
20
return cleanup transactional state release metadata increment gv# write data validate sync acquire metadata check for read-only setup call other
performance by 2 to 15 percent. We believe such analysis is a good practice that should be extended to every piece of system software, especially open source. However, the gains are only a minor dent in the overheads we observed, indicating the challenge that lies before the community in making STM performance compelling.
10
0
fv gv# b+tree
fv gv# delaunay
fv gv# kmeans
fv gv# genome
fv gv# vacation
STMs that use a global version number. 49 JudoSTM50 and RingSTM51 reduce the number of atomic operations that must be performed when committing a transaction at the cost of serializing commit and/or incurring spurious aborts because of imprecise conflict detection. Several proposals have been made for STM systems that operate via dynamic binary rewriting in order to allow the usage of STM on legacy binaries. 52, 53, 54
Yoo et al.55 analyze the overhead in the execution of Intel’s STM. 56, 57 They identify four major sources of overhead: over-instrumentation, false sharing, amortization costs, and privatization-safety costs. False sharing, privatization-safety, and over-instrumentation are implementation artifacts that can be eliminated by using either finer-granularity bookkeeping, more refined analysis, or user annotations. Amortization costs are inherent overheads in STM that, as we demonstrated here, are not likely to be eliminated.
A large amount of research effort has been spent in analyzing the operations in TM systems. Recent software optimizations have managed to accelerate STM
Based on our results, we believe that the road ahead for STM is quite challenging. Lowering the overhead of STM to a point where it is generally appealing is a difficult task, and significantly better results have to be demonstrated. If we could stress a single direction for further research, it is the elimination of dynamically unnecessary read and write barriers—
possibly the single most powerful lever toward further reduction of STM overheads. Given the difficulty of similar problems explored by the research community such as alias analysis, escape analysis, and so on, this may be an uphill battle. Because the argument for TM hinges upon its simplicity and productivity benefits, we are deeply skeptical of any proposed solutions to performance problems that require extra work by the programmer.
We observed that the TM programming model itself, whether implemented in hardware or software, introduces complexities that limit the expected productivity gains, thus reducing the current incentive for migration to transactional programming and the justification at present for anything more than a small amount of hardware support. Q
ACKNOWLEDGMENTS
We would like to thank Pratap Pattnaik for his continuous support, Christoph von Praun for numerous discussions and work on benchmarks and runtimes, and Rajesh Bordawekar for the b+tree code implementation.
References:
Archives