2. BacKGRounD
Throughout this paper we study concurrency between
threads running on a shared-memory machine; we do not
consider questions of external interaction through storage
systems or databases, nor do we address distributed systems.
The kinds of problem we have in mind are building collection
classes (queues, lists, and so on) and other data structures
that concurrent threads use to maintain shared information. There are many other approaches to concurrency that
we do not discuss, including data-parallel abstractions from
languages like NESL2 and those from the high-performance
computing community such as OpenMP and MPI.
Even in this restricted setting, concurrent programming
is extremely difficult. The dominant programming technique is based on locks, an approach that is simple and direct, but that simply does not scale with program size and
complexity. To ensure correctness, programmers must identify which operations conflict; to ensure liveness, they must
avoid introducing deadlock; to ensure good performance,
they must balance the granularity at which locking is performed against the costs of fine-grain locking.
Perhaps the most fundamental objection, though, is that
lock-based programs do not compose. For example, consider
a hash table with thread-safe insert and delete operations.
Now suppose that we want to delete one item A from table
t1, and insert it into table t2; but the intermediate state (in
which neither table contains the item) must not be visible
to other threads. Unless the implementer of the hash table
anticipates this need, there is simply no way to satisfy this requirement without somehow locking out all other accesses
to the table. One approach is to expose concurrency control
methods such as Lock Table and Unlock Table—but this
breaks the hash table abstraction, and invites lock-induced
deadlock, depending on the order in which the client takes
the locks, or race conditions if the client forgets. Yet more
complexity is required if the client wants to await the presence of A in t1—but this blocking behavior must not lock
the table (else A cannot be inserted). In short, operations
that are individually correct (insert, delete) cannot be composed into larger correct operations.
The same phenomenon shows up trying to compose alternative blocking operations. Suppose a procedure p1 waits
for one of two input pipes to have data, using a call to the
Unix select procedure; and suppose another procedure
p2 does the same thing, on two other pipes. In Unix there
is no way to perform a select between p1 and p2, a fundamental loss of compositionality. Instead, Unix programmers
learn awkward programming techniques to gather up all the
file descriptors that must be waited for, perform a single top-level select, and then dispatch back to the correct handler.
Again, two individually correct abstractions, p1 and p2, cannot be composed into a larger one; instead, they must be
ripped apart and awkwardly merged, in direct conflict with
the goals of abstraction.
Rather than fixing locks, a more promising and radical
alternative is to base concurrency control on atomic memory transactions, also known as transactional memory. We
will show that transactional memory offers a solution to
the tension between concurrency and abstraction. For ex-
ample, with memory transactions we can manipulate the
hash table thus:
atomic {v := delete(t1, A); insert(t2, A, v)}
and to wait for either p1 or p2 we can say
atomic {p1 ‘orElse‘ p2}
These simple constructions require no knowledge of the
implementation of insert, delete, p1 or p2, and they
continue to work correctly if these operations may block, as
we shall see.
2. 1. transactional memory
The idea of transactions is not new. They have been a fundamental mechanism in database design for many years,
and there has been much subsequent work on transactional
memory. Larus and Rajwar provide a recent survey. 14
The key idea is that a block of code, including nested calls,
can be enclosed by an atomic block, with the guarantee that
it runs atomically with respect to every other atomic block.
Transactional memory can be implemented using optimistic
synchronization. Instead of taking locks, an atomic block
runs without locking, accumulating a thread-local transaction
log that records every memory read and write it makes. When
the block completes, it first validates its log, to check that
it has seen a consistent view of memory, and then commits
its changes to memory. If validation fails, because memory
read by the method was altered by another thread during the
block’s execution, then the block is re-executed from scratch.
Suitably implemented transactional memory eliminates
many of the low-level difficulties that plague lock-based programming. There are no lock-induced deadlocks (because
there are no locks); there is no priority inversion; and there is
no painful tension between granularity and concurrency. However, initial work made little progress on transactional abstractions that compose well. There are three particular problems.
Firstly, since a transaction may be rerun automatically,
it is essential that it does nothing irrevocable. For example,
the transaction
atomic {if (n > k) then launchMissiles(); S2}
might launch a second salvo of missiles if it were re-executed. It might also launch the missiles inadvertently if, say, the
thread was de-scheduled after reading n but before reading
k, and another thread modified both before the thread was
resumed. This problem begs for a guarantee that the body of
the atomic block can only perform memory operations, and
hence can only make benign modifications to its own transaction log, rather than performing irrevocable input/output.
Secondly, many systems do not support synchronization between transactions, and those that do rely on a a
programmer-supplied Boolean guard on the atomic block. 8
For example, a method to get an item from a buffer might be: