Recording transactions. In the explanation above we assumed that the exact location of each memory location
used in the transaction is recorded.
This might be inefficient, though, especially with HTM support. Recording
information for every memory location would mean having an overhead
of several words for each memory
location used. As with CPU caches,
which theoretically could also cache
individual words, this often constitutes too high of a price. Instead, CPU
caches today handle cache lines of 64
bytes at once. This would mean for a
TM implementation that step 2 in our
description would not have to record
an additional dependency in case the
memory location is in a block that is
already recorded.
But this introduces problems as
well. Assume that in the final example
code all four variables are in the same
block. This means that our assumption of f1 _ 1 and f2 _ 2 being independently executable is wrong. This type
of block sharing leads to high abort
rates. It is related to the problem of
false sharing, which in this case also
happens and therefore should be corrected anyway.
These false aborts, as we might
want to call them, are not only a performance issue, though. Unfortunate
placement of variables actually might
lead to problems never making any
progress at all because they are constantly inadvertently aborting each
other. This can happen because of
several different transactions going
on concurrently that happen to touch
the cache memory blocks but different
addresses. If blocking is used, this is a
problem that must be solved.
illUstration by oli larUelle
Handling aborts. Another detail described earlier is the way aborts are
handled. What has been described is
the so-called lazy abort/lazy commit
method (lazy/lazy for short). Transactions continue to work even if they are
already aborted and the results of the
transaction are written into the real
memory location only when the entire
transaction succeeded.
This is not the only possibility,
though. Another one is the exact opposite: the eager/eager method. In this
case transactions will be recognized
as aborted as early as possible and restarted if necessary. The effect of store
instructions will also immediately
take effect. In this case the old value of
the memory location has to be stored
in memory local to the transaction so
that, in case the transaction has to be
aborted, the previous content can be
restored.
There are plenty of other ways to
handle the details. It might turn out
that no one way is sufficient. Much will
depend on the abort rate for the individual transaction. It could very well
be that compilers and TM runtimes
will implement multiple different
ways at the same time and flip between
them for individual transactions if this
seems to offer an advantage.
Semantics. The semantics of the
tm _ atomic block (or whatever it will
be in the end) have to be specified. It is
necessary to integrate TM into the rest
of the language semantics. For example, TM must be integrated with exception handling for C++. Other issues are
the handling of nested TM regions and
the treatment of local variables (they
need not be part of the transaction but
still have to be reset on abort).
Performance. Performance is also a
major issue. Plenty of optimizations
can and should be performed by the
compiler, and all this needs research.
There are also practical problems.
If the same program code is used in
contested and uncontested situations
(for example, in a single-threaded
program), the overhead introduced
through TM is too high. It therefore
might be necessary to generate two
versions of each function: one with TM
support and the other without. The TM
runtime then has to make sure that the
version without TM support is used as
frequently as possible. Failure to the
one side means loss of performance,
failure to the other side means the program will not run correctly.
Conclusion
TM promises to make parallel programming much easier. The concept
of transaction is already present in
many programs (from business programs to dynamic Web applications)
and it proved to be reasonably easy
to grasp for programmers. We can
see first implementations coming
out now, but all are far from ready for
prime time. Much research remains to
be done.
References
1. drepper, U. What every programmer should know
about memory (2007); http://people.redhat.com/
drepper/cpumemory.pdf.
2. herlihy, M., Moss, J.e.b. transactional memory:
architectural support for lock-free data structures.
in Proceedings of 20th International Symposium on
Computer Architecture (1993); http://citeseer.ist.psu.
edu/ herlihy93transactional.html.
Ulrich Drepper ( drepper@redhat.com) is a consulting
engineer at red hat, where he has worked for the
past 12 years. he is interested in all kinds of low-level
programming and has been involved with linux for almost
15 years.
a previous version of this appeared in the september 2008
issue of ACM Queue.
feBRuaRY 2009 | vol. 52 | No. 2 | CommunICatIons of the aCm
43