TxLinux and MetaTM:
Transactional Memory and
the Operating System

By Christopher J. Rossbach, hany E. Ramadan, owen S. hofmann, Donald E. Porter, Aditya Bhandari, and Emmett Witchel

abstract

TxLinux is the first operating system to use hardware transactional memory (HTM) as a synchronization primitive, and the first to manage HTM in the scheduler. TxLinux, which is a modification of Linux, is the first real-scale benchmark for transactional memory (TM). Meta TM is a modification of the x86 architecture that supports HTM in general and TxLinux specifically.

This paper describes and measures TxLinux and Meta TM, the HTM model that supports it. TxLinux greatly benefits from a new primitive, called the cooperative transactional spinlock (cxspinlock) that allows locks and transactions to protect the same data while maintaining the advantages of both synchronization primitives. Integrating the TxLinux scheduler with the Meta TM’s architectural support for HTM eliminates priority inversion for several real-world benchmarks.

1. intRoDuction

To increase performance, hardware manufacturers have turned away from scaling clock speed and are focusing on scaling the number of cores on a chip. Increasing performance on new hardware will require finding ways to take advantage of the parallelism made available by multiple hardware processing contexts—a burden placed directly on the software programmer. New generations of hardware will not increase the performance of user applications unless something is done to make concurrent programming easier, so the need for accessible approaches to parallel programming is increasingly urgent.

The current approach to achieving concurrency using parallel programming relies heavily on threading. Multiple sequential flows of control (threads) execute at the same time using locks to protect critical sections. Locks guarantee mutually exclusive access to shared resources. Unfortunately, parallel programming using threads and locks remains quite difficult, even for experienced programmers. Locks suffer from a number of well-known and long-lamented problems such as deadlock, convoys, and priority inversion; they compose poorly and require complex ordering disciplines to coordinate the use of multiple locks. There is also an unattractive performance-complexity trade-off associated with locks. Coarse-grain locking is simple to reason about but sacrifices concurrent performance. Fine-grain locking may enable high performance, but it makes code more complex, harder to maintain because it is dependent

on invariants that are difficult to express or enforce. TM has been the focus of much recent research attention as a technique that can provide the performance of fine-grain locking with the code complexity of coarse-grain locking.

TM is a programming model that can greatly simplify parallel programming. A programmer demarcates critical sections that may access shared data as transactions, which are sequences of memory operations that either execute completely (commit) or have no effect (abort). The system is responsible for ensuring that transactions execute atomically (either completely or not at all), and in isolation, meaning that a transaction cannot see the effects of other active transactions, and its own operations are not visible in the system until it commits. While transactions provide the abstraction of completely serial execution of critical section, the system actually executes them optimistically, allowing multiple transactions to proceed concurrently, as long as atomicity and isolation are not violated. The programmer benefits because the system provides atomicity: reasoning about partial failures in critical sections is no longer necessary. Because transactions can be composed, and do not suffer from deadlock, programmers can freely compose thread-safe libraries based on transactions.

HTM provides an efficient hardware implementation of TM that is appropriate for use in an OS. Operating systems benefit from using TM because TM provides a simpler programming model than locks. For instance, operating system has locking disciplines that specify the order in which locks must be acquired to avoid deadlock. These disciplines become complex over time and are difficult for programmers to master; transactions require no ordering disciplines. Because many applications spend a significant fraction of their runtime in the kernel (by making system calls, e.g., to read and write files), another benefit of TM in the OS is increased performance for user programs without having to modify or recompile them.

However, management and support of HTM in an operating system requires innovation both in the architecture and the operating system. Transactions cannot simply replace or eliminate locks in an operating system for two main reasons. The first is that many kernel critical sections perform I/O, actually changing the state of devices like the disk or network card. I/O is a problem for TM because TM systems assume that if a conflict occurs, one transaction can be aborted, rolled back to its start, and re-executed. However, when the OS performs I/O it actually changes the

References:

Archives