transactional memory

The concept of transactions can be transferred to memory operations performed in programs as well. One could of course regard the in-memory data a program keeps as tables corresponding to those in databases, and then just implement the same functionality. This is rather limiting, though, since it forces programmers to dramatically alter the way they are writing programs, and systems programming cannot live with such restrictions.

Fortunately, this is not needed. The concept of transactional memory (TM) has been defined without this restriction. Maurice Herlihy and J. Eliot B. Moss in their 1993 paper describe a

2

hardware implementation that can be implemented on top of existing cache coherency protocols reasonably easily. 1

The description in the paper is generic. First, there is no need to require that transactional memory is implemented in hardware, exclusively or even in part. For the purpose mentioned in the paper’s title (lock-free data structures), hardware support is likely going to be essential. But this is not true in general, as we will see shortly. Second, the description must be transferred to today’s available hardware. This includes implementation details such as the possible reuse of the cache coherency protocol and the granularity of the transactions, which most likely will not be a single word but instead a cache line.

Hardware support for TM will itself be mostly interesting for the implementation of lock-free data structures. To implement, for example, the insert of a new element into a double-linked list without locking, four pointers have to be updated atomically. These pointers are found in three list elements, which mean it is not possible to implement this using simple atomic operations. Hardware TM (HTM) provides a means to implement atomic operations operating on more than one memory word. To provide more general support for transactional memory beyond atomic data structures, software support is needed. For example, any hardware implementation will have a limit on the size of a transaction. These limits might be too low for nontrivial programs or they might differ among implementations.

Software can and must complete the HTM support to extend the reach of the TM implementation meant to be used for general programming.

This has been taken a step further. Because today’s hardware is mostly lacking in HTM support, software TM (STM) is what most research projects are using today. With STM-based solutions it is possible to provide interfaces to TM functionality, which later could be implemented in hybrid TM implementations, using hardware if possible. This allows programmers to write programs using the simplifications TM provides even without HTM support in the hardware.

 

show me the Problem To convince the reader that TM is worth all the trouble, let’s look at a little example. This is not meant to reflect realistic code but instead illustrates problems that can happen in real code:

 

long counter1; long counter2; time _ t timestamp1; time _ t timestamp2;

void f1 _ 1(long *r, time _ t *t) {

*t = timestamp1;

*r = counter1++;

}

void f2 _ 2(long *r, time_ t *t) {

*t = timestamp2;

*r = counter2++;

}

void w1 _ 2(long *r, time _ t *t) {

*r = counter1++; if (*r & 1)

*t = timestamp2;

}

 

void w2_ 1(long *r, time _ t *t) { *r = counter2++; if (*r & 1)

*t = timestamp1;

}

 

Assume this code has to be made thread safe. This means that multiple threads can concurrently execute any of the functions and that doing so must not produce any invalid result. The latter is defined here as return counter and timestamp values that don’t belong together.

It is certainly possible to define one single mutex lock and require that this mutex is taken in each of the four functions. It is easy to verify that this would generate the expected results, but the performance is potentially far from optimal.

Assume that most of the time only the functions f1 _ 1 and f2 _ 2 are used. In this case there would never be any conflict between callers of these functions: callers of f1 _ 1 and f2 _ 2 could peacefully coexist. This means that using one single lock unnecessarily slows down the code.

So, then, use two locks. But ho w to define them? The semantics would have to be in the one case “when counter1 and timestamp1 are used” and “when coun- ter2 and timestamp2 are used,” respectively. This might work for f1 _ 1 and f2 _ 2, but it won’t work for the other two functions. Here the pairs counter1/ timestamp2 and counter2/timestamp1 are used together. So we have to go yet another level down and assign a separate lock to each of the variables.

Assuming we would do this, we could easily be tempted to write something like this (only two functions are mentioned here; the other two are mirror images):

 

void f1_ 1(long *r, time _t *t) { lock(l _ timestamp1); lock(l _ counter1);

*t = timestamp1; *r = counter1++;

}

 

void w1 _ 2(long *r, time _t *t) { lock(l _ counter1);

*r = counter1++;

if (*r & 1) { lock(l _ timestamp1); *t = timestamp2; unlock(l _ timestamp1);

}

unlock(l _ counter1);

}

 

The code for w1 _ 2 in this example is wrong. We cannot delay getting the l _ timestamp1 lock because it might produce inconsistent results. Even though it might be slower, we always have to get the lock:

 

40 CommunICatIons of the aCm | feBRuaRY 2009 | vol. 52 | No. 2

References:

Archives