TxLinux and MetaTM:
Transactional Memory and
the Operating System
By Christopher J. Rossbach, hany E. Ramadan, owen S. hofmann, Donald E. Porter,
Aditya Bhandari, and Emmett Witchel
abstract
TxLinux is the first operating system to use hardware transactional memory (HTM) as a synchronization primitive, and
the first to manage HTM in the scheduler. TxLinux, which
is a modification of Linux, is the first real-scale benchmark
for transactional memory (TM). Meta TM is a modification of
the x86 architecture that supports HTM in general and TxLinux specifically.
This paper describes and measures TxLinux and Meta TM,
the HTM model that supports it. TxLinux greatly benefits
from a new primitive, called the cooperative transactional
spinlock (cxspinlock) that allows locks and transactions to
protect the same data while maintaining the advantages
of both synchronization primitives. Integrating the TxLinux scheduler with the Meta TM’s architectural support for
HTM eliminates priority inversion for several real-world
benchmarks.
1. intRoDuction
To increase performance, hardware manufacturers have
turned away from scaling clock speed and are focusing on
scaling the number of cores on a chip. Increasing performance on new hardware will require finding ways to take advantage of the parallelism made available by multiple hardware processing contexts—a burden placed directly on the
software programmer. New generations of hardware will not
increase the performance of user applications unless something is done to make concurrent programming easier, so
the need for accessible approaches to parallel programming
is increasingly urgent.
The current approach to achieving concurrency using
parallel programming relies heavily on threading. Multiple
sequential flows of control (threads) execute at the same
time using locks to protect critical sections. Locks guarantee
mutually exclusive access to shared resources. Unfortunately, parallel programming using threads and locks remains
quite difficult, even for experienced programmers. Locks
suffer from a number of well-known and long-lamented
problems such as deadlock, convoys, and priority inversion; they compose poorly and require complex ordering
disciplines to coordinate the use of multiple locks. There is
also an unattractive performance-complexity trade-off associated with locks. Coarse-grain locking is simple to reason
about but sacrifices concurrent performance. Fine-grain
locking may enable high performance, but it makes code
more complex, harder to maintain because it is dependent
on invariants that are difficult to express or enforce. TM has
been the focus of much recent research attention as a technique that can provide the performance of fine-grain locking with the code complexity of coarse-grain locking.
TM is a programming model that can greatly simplify
parallel programming. A programmer demarcates critical
sections that may access shared data as transactions, which
are sequences of memory operations that either execute
completely (commit) or have no effect (abort). The system
is responsible for ensuring that transactions execute
atomically (either completely or not at all), and in isolation, meaning that a transaction cannot see the effects of other active
transactions, and its own operations are not visible in the
system until it commits. While transactions provide the abstraction of completely serial execution of critical section,
the system actually executes them optimistically, allowing
multiple transactions to proceed concurrently, as long as
atomicity and isolation are not violated. The programmer
benefits because the system provides atomicity: reasoning
about partial failures in critical sections is no longer necessary. Because transactions can be composed, and do not suffer from deadlock, programmers can freely compose thread-safe libraries based on transactions.
HTM provides an efficient hardware implementation of
TM that is appropriate for use in an OS. Operating systems
benefit from using TM because TM provides a simpler programming model than locks. For instance, operating system
has locking disciplines that specify the order in which locks
must be acquired to avoid deadlock. These disciplines become complex over time and are difficult for programmers
to master; transactions require no ordering disciplines. Because many applications spend a significant fraction of their
runtime in the kernel (by making system calls, e.g., to read
and write files), another benefit of TM in the OS is increased
performance for user programs without having to modify or
recompile them.
However, management and support of HTM in an operating system requires innovation both in the architecture
and the operating system. Transactions cannot simply
replace or eliminate locks in an operating system for two
main reasons. The first is that many kernel critical sections
perform I/O, actually changing the state of devices like the
disk or network card. I/O is a problem for TM because TM
systems assume that if a conflict occurs, one transaction
can be aborted, rolled back to its start, and re-executed.
However, when the OS performs I/O it actually changes the