component of the total time.
Overhead Optimizations. There have
been many proposals on reducing STM
overheads through compiler or runtime
techniques, most of which are complementary to STM hardware acceleration.
˲ Redundant barrier elimination. One
technique is to eliminate barriers to
thread-local objects through escape
analysis. Such analysis is typically quite
effective identifing thread-local accesses that are close to the object allocation
site. It can eliminate both read- and
write-barriers, but is often more effective on write-barriers. For example, we
observe that an intra-procedural escape
analysis can eliminate 40–50% of write
barriers in vacation, genome, and b+tree.
However, its impact on performance is
more limited: from negligible to 12%.
To target redundant read-barriers, a
whole-program analysis called Not-Ac-cessed-In-Transaction analysis27
eliminates some barriers to read-only objects
in transactions;
˲ Barrier strength reduction. These optimizations do not eliminate barriers,
but identify at runtime special locations
that require only lightweight barrier
processing, such as dynamic tracking of
thread-local objects11, 27 and runtime filtering of stack references and duplicate
references; 11
˲ Code generation optimizations. One
common technique is to inline the fast
path of barriers. It has the potential
benefit of reducing function call overhead, increasing ILP, and exposing reuse of common sub-barrier operations.
In our experiments, compiler inlining
achieved less than 2% overall improvement across our benchmark suite;
˲ Commit sequence optimizations.
Eliminating unnecessary global version
number updates37 improves the overall
performance of several micro-bench-marks by up to 14%.
Such optimizations have a positive
impact on STM performance. However,
the results presented here indicate how
much further innovation is needed for
the performance of STMs to become
generally appealing to users.
posed STM systems. 7 Conflict detection
is simplified significantly by the static
nature because conflicts can be ruled
out already when ownership records are
acquired (at transaction start).
DSTM12 is the first dynamic STM
system; the design follows a per-object
runtime organization (locator object).
Variables (objects) in the application
heap refer to a locator object. Unlike
in a design with ownership records (for
example, Harris and Fraser10), the locator does not store a version number but
refers to the most recently committed
version of the object. A particularity of
the DSTM design is that objects must be
explicitly ‘opened’ (in read-only or read-write mode) before transactional access;
also DSTM allows for early release. The
authors argue that both mechanisms facilitate the reduction of conflicts.
The design principles of the RSTM18
system are similar to DSTM in that it associates transactional metadata with objects. Unlike DSTM however, the system
does not require the dynamic allocation
of transactional data but co-locates it
with the non-transactional data. This
scheme has two benefits: first, it facilitates spatial access locality and hence
fosters execution performance and
transaction throughput. Second, the dynamic memory management of transactional data (usually done through a
garbage collector) is not necessary and
hence this scheme is amenable for use
in environments where memory management is explicit.
Recent work explored algorithmic
optimizations and/or alternative implementations of the basic STM algorithms
described here. Riegel et al. propose the
use of real-time clocks to enhance the
STM scalability using a global version
number. 22 JudoS TM21 and RingSTM29 reduce the number of atomic operations
that must be performed when committing a transaction at the cost of serializing commit and/or incurring spurious
aborts due to imprecise conflict detection. Several proposals have been made
for STMs that operate via dynamic binary rewriting in order to allow the usage
of STM on legacy binaries. 8, 21, 33
Yoo et. al36 analyze the overhead in
the execution of Intel’s STM. 14, 23 They
identify four major sources of overhead:
over-instrumentation, false sharing,
amortization costs, and privatization-safety costs. False sharing, privatiza-tion-safety, and over-instrumentation
are implementation artifacts that can
be eliminated by either using finer
granularity bookkeeping, more refined
analysis, or user annotations. Amortization costs are inherent overheads in an
STM that, as we demonstrated here, are
not likely to be eliminated.
A large amount of research effort
has been spent in analyzing the opera-
figure 7: Percentage of time spent in stm end sub-operations.
return
cleanup transactional state
release metadata
increment gv#
write data
validate
sync
acquire metadata
check for read-only
setup
call
other
100
90
of cycles (norm. to fv)
80
70
60
50
40
30
Related Work
The first STM system was proposed by
Shavit and Touitou26 and is based on
object ownership. The protocol is static,
which is a significant shortcoming that
has been overcome by subsequently pro-
20
10
0
fv gv#
b+tree
fv gv#
delaunay
fv gv#
kmeans
fv gv#
genome
fv gv#