transactional memory
The concept of transactions can be
transferred to memory operations performed in programs as well. One could
of course regard the in-memory data a
program keeps as tables corresponding to those in databases, and then
just implement the same functionality.
This is rather limiting, though, since it
forces programmers to dramatically alter the way they are writing programs,
and systems programming cannot live
with such restrictions.
Fortunately, this is not needed. The
concept of transactional memory (TM)
has been defined without this restriction. Maurice Herlihy and J. Eliot B.
Moss in their 1993 paper describe a
2
hardware implementation that can be
implemented on top of existing cache
coherency protocols reasonably easily.
1
The description in the paper is
generic. First, there is no need to require that transactional memory is
implemented in hardware, exclusively
or even in part. For the purpose mentioned in the paper’s title (lock-free
data structures), hardware support is
likely going to be essential. But this
is not true in general, as we will see
shortly. Second, the description must
be transferred to today’s available
hardware. This includes implementation details such as the possible reuse
of the cache coherency protocol and
the granularity of the transactions,
which most likely will not be a single
word but instead a cache line.
Hardware support for TM will itself
be mostly interesting for the implementation of lock-free data structures.
To implement, for example, the insert
of a new element into a double-linked
list without locking, four pointers
have to be updated atomically. These
pointers are found in three list elements, which mean it is not possible
to implement this using simple atomic operations. Hardware TM (HTM)
provides a means to implement atomic operations operating on more than
one memory word. To provide more
general support for transactional
memory beyond atomic data structures, software support is needed. For
example, any hardware implementation will have a limit on the size of a
transaction. These limits might be too
low for nontrivial programs or they
might differ among implementations.
Software can and must complete the
HTM support to extend the reach of
the TM implementation meant to be
used for general programming.
This has been taken a step further.
Because today’s hardware is mostly
lacking in HTM support, software TM
(STM) is what most research projects
are using today. With STM-based solutions it is possible to provide interfaces
to TM functionality, which later could
be implemented in hybrid TM
implementations, using hardware if possible. This allows programmers to write
programs using the simplifications TM
provides even without HTM support in
the hardware.
show me the Problem
To convince the reader that TM is worth
all the trouble, let’s look at a little example. This is not meant to reflect realistic code but instead illustrates problems that can happen in real code:
long counter1;
long counter2;
time _ t timestamp1;
time _ t timestamp2;
void f1 _ 1(long *r, time _ t *t) {
*t = timestamp1;
*r = counter1++;
}
void f2 _ 2(long *r, time_ t *t) {
*t = timestamp2;
*r = counter2++;
}
void w1 _ 2(long *r, time _ t *t) {
*r = counter1++;
if (*r & 1)
*t = timestamp2;
}
void w2_ 1(long *r, time _ t *t) {
*r = counter2++;
if (*r & 1)
*t = timestamp1;
}
Assume this code has to be made
thread safe. This means that multiple
threads can concurrently execute any
of the functions and that doing so must
not produce any invalid result. The latter is defined here as return counter
and timestamp values that don’t belong together.
It is certainly possible to define one
single mutex lock and require that
this mutex is taken in each of the four
functions. It is easy to verify that this
would generate the expected results,
but the performance is potentially far
from optimal.
Assume that most of the time only
the functions f1 _ 1 and f2 _ 2 are used.
In this case there would never be any
conflict between callers of these functions: callers of f1 _ 1 and f2 _ 2 could
peacefully coexist. This means that using one single lock unnecessarily slows
down the code.
So, then, use two locks. But ho w to define them? The semantics would have to
be in the one case “when counter1 and
timestamp1 are used” and “when coun-
ter2 and timestamp2 are used,” respectively. This might work for f1 _ 1 and
f2 _ 2, but it won’t work for the other
two functions. Here the pairs counter1/
timestamp2 and counter2/timestamp1
are used together. So we have to go yet
another level down and assign a separate lock to each of the variables.
Assuming we would do this, we could
easily be tempted to write something
like this (only two functions are mentioned here; the other two are mirror
images):
void f1_ 1(long *r, time _t *t) {
lock(l _ timestamp1);
lock(l _ counter1);
*t = timestamp1;
*r = counter1++;
}
void w1 _ 2(long *r, time _t *t) {
lock(l _ counter1);
*r = counter1++;
if (*r & 1) {
lock(l _ timestamp1);
*t = timestamp2;
unlock(l _ timestamp1);
}
unlock(l _ counter1);
}
The code for w1 _ 2 in this example
is wrong. We cannot delay getting the
l _ timestamp1 lock because it might
produce inconsistent results. Even
though it might be slower, we always
have to get the lock:
40 CommunICatIons of the aCm | feBRuaRY 2009 | vol. 52 | No. 2