contributed articles
Doi: 10.1145/1924421.1924440
Despite earlier claims, Software Transactional
Memory outperforms sequential code.
By aLeKsanDaR DRaGoJeViĆ, PasCaL FeLBeR,
VinCent GRamoLi, anD RaChiD GueRRaoui
Why stm
Can Be more
than a
Research toy
whILE muLTIcoRE ARchITEcTuRES are increasingly
the norm in CPUs, concurrent programming
remains a daunting challenge for many. The
transactional-memory paradigm simplifies
concurrent programming by enabling programmers
to focus on high-level synchronization concepts,
or atomic blocks of code, while ignoring low-level
implementation details.
Hardware transactional memory has shown
promising results for leveraging parallelism4 but
is restrictive, handling only transactions of limited
size or requiring some system events or CPU
instructions to be executed outside transactions.
4
Despite attempts to address these issues,
19 TM
systems fully implemented in hardware are unlikely
to be commercially available for at least the next few
years. More likely is that future deployed TMs will be hybrids containing
a software component and a hardware
component.
Software transactional memory15, 23
circumvents the limitations of HTM by
implementing TM functionality fully
in software. Moreover, several STM implementations are freely available and
appealing for concurrent programming.
1, 5, 7, 11, 14, 16, 20 Yet STM credibility
depends on the extent to which it enables application code to leverage multicore architectures and outperform
sequential code. Cascaval et al.’s 2008
article3 questioned this ability and suggested confining STM to the status of
“research toy.” STMs indeed introduce
significant runtime overhead:
Synchronization costs. Each read (or
write) of a memory location from inside
a transaction is performed by a call to
an STM routine for reading (or writing)
data. With sequential code, this access
is performed by a single CPU instruction. STM read and write routines are
significantly more expensive than corresponding CPU instructions, as they
typically “bookkeep” data about every
access. STMs check for conflicts, log
access, and, in case of a write, log the
current (or old) value of the data. Some
of these operations use expensive synchronization instructions and access
shared metadata, further increasing
their cost.
Compiler overinstrumentation. Using
key insights
stm is improving in terms of
performance, often outperforming
sequential, nontransactional code,
when running with just four CPu cores.
Parallel applications exhibiting high
contention are not the primary
target for stm so therefore are not
the best benchmarks for evaluating
stm performance.
stm with support for compiler
instrumentation and explicit,
nontransparent privatization
outperforms sequential code in
all but one workload we used and
still supports the programming
model that is easy to use by
typical programmers.