component of the total time.
Overhead Optimizations. There have been many proposals on reducing STM overheads through compiler or runtime techniques, most of which are complementary to STM hardware acceleration.
˲ Redundant barrier elimination. One technique is to eliminate barriers to thread-local objects through escape analysis. Such analysis is typically quite effective identifing thread-local accesses that are close to the object allocation site. It can eliminate both read- and write-barriers, but is often more effective on write-barriers. For example, we observe that an intra-procedural escape analysis can eliminate 40–50% of write barriers in vacation, genome, and b+tree. However, its impact on performance is more limited: from negligible to 12%. To target redundant read-barriers, a whole-program analysis called Not-Ac-cessed-In-Transaction analysis27 eliminates some barriers to read-only objects in transactions;
˲ Barrier strength reduction. These optimizations do not eliminate barriers, but identify at runtime special locations that require only lightweight barrier processing, such as dynamic tracking of thread-local objects11, 27 and runtime filtering of stack references and duplicate references; 11
˲ Code generation optimizations. One common technique is to inline the fast path of barriers. It has the potential benefit of reducing function call overhead, increasing ILP, and exposing reuse of common sub-barrier operations. In our experiments, compiler inlining achieved less than 2% overall improvement across our benchmark suite;
˲ Commit sequence optimizations. Eliminating unnecessary global version number updates37 improves the overall performance of several micro-bench-marks by up to 14%.
Such optimizations have a positive impact on STM performance. However, the results presented here indicate how much further innovation is needed for the performance of STMs to become generally appealing to users.
posed STM systems. 7 Conflict detection is simplified significantly by the static nature because conflicts can be ruled out already when ownership records are acquired (at transaction start).
DSTM12 is the first dynamic STM system; the design follows a per-object runtime organization (locator object). Variables (objects) in the application heap refer to a locator object. Unlike in a design with ownership records (for example, Harris and Fraser10), the locator does not store a version number but refers to the most recently committed version of the object. A particularity of the DSTM design is that objects must be explicitly ‘opened’ (in read-only or read-write mode) before transactional access; also DSTM allows for early release. The authors argue that both mechanisms facilitate the reduction of conflicts.
The design principles of the RSTM18 system are similar to DSTM in that it associates transactional metadata with objects. Unlike DSTM however, the system does not require the dynamic allocation of transactional data but co-locates it with the non-transactional data. This scheme has two benefits: first, it facilitates spatial access locality and hence fosters execution performance and transaction throughput. Second, the dynamic memory management of transactional data (usually done through a garbage collector) is not necessary and
hence this scheme is amenable for use in environments where memory management is explicit.
Recent work explored algorithmic optimizations and/or alternative implementations of the basic STM algorithms described here. Riegel et al. propose the use of real-time clocks to enhance the STM scalability using a global version number. 22 JudoS TM21 and RingSTM29 reduce the number of atomic operations that must be performed when committing a transaction at the cost of serializing commit and/or incurring spurious aborts due to imprecise conflict detection. Several proposals have been made for STMs that operate via dynamic binary rewriting in order to allow the usage of STM on legacy binaries. 8, 21, 33
Yoo et. al36 analyze the overhead in the execution of Intel’s STM. 14, 23 They identify four major sources of overhead: over-instrumentation, false sharing, amortization costs, and privatization-safety costs. False sharing, privatiza-tion-safety, and over-instrumentation are implementation artifacts that can be eliminated by either using finer granularity bookkeeping, more refined analysis, or user annotations. Amortization costs are inherent overheads in an STM that, as we demonstrated here, are not likely to be eliminated.
A large amount of research effort has been spent in analyzing the opera-
figure 7: Percentage of time spent in stm end sub-operations.
return cleanup transactional state release metadata increment gv#
write data
validate
sync
acquire metadata
check for read-only
setup
call
other
100
90
of cycles (norm. to fv)
80
70
60
50
40
30
The first STM system was proposed by Shavit and Touitou26 and is based on object ownership. The protocol is static, which is a significant shortcoming that has been overcome by subsequently pro-
20
10
0
fv gv#
b+tree
fv gv#
delaunay
fv gv#
kmeans
fv gv#
genome
fv gv#
References:
Archives