state of a device (e.g., by writing data to the network). Most
devices cannot revert to a previous state once a write operation completes, so a transaction that performs I/O cannot
be rolled back and re-executed. The second reason is that
some kernel critical sections are highly contended and currently locks are more efficient than transactions for highly
contended critical sections. Under contention, the optimism of transactions is unwarranted and the rollbacks and
back-off performed by the TM system can significantly reduce performance.
The cxspinlock (cooperative transactional spinlock) is a
new primitive that addresses the problem of I/O in transactions, allowing locks and transactions to work together
to protect the same data while maintaining both of their
advantages. Previous HTM proposals require every execution of a critical section to be protected by either a lock or
a transaction, while cxspinlocks allow a critical section or
a data structure accessed from different critical sections to
sometimes be protected by a lock and sometimes by a transaction. Cxspinlocks dynamically and automatically choose
between locks and transactions. Cxspinlocks attempt to execute critical sections as transactions by default, but when
the processor detects an I/O attempt, the transactions are
rolled back, and the cxspinlock will ensure that the thread
re-executes the critical section exclusively, blocking other
transactional and non-transactional threads. Additionally,
cxspinlocks provide a convenient API for converting lock-based code to use transactions.
HTM enables a solution to the long-standing problem of
priority inversion due to locks. Priority inversion occurs when
a high priority thread waits for a lock held by a low priority
thread. We demonstrate the modifications necessary in the
TxLinux scheduler and the TM hardware to nearly eliminate
priority and policy inversion. Moreover, the OS can improve
its scheduling algorithms to help manage high contention
by leveraging a thread’s transaction history to calculate the
thread’s dynamic priority or de-schedule conflicting threads.
This paper makes the following contributions:
1. Creation of a transactional operating system, TxLinux,
based on the Linux kernel. TxLinux is among the largest real-life programs that use HTM, and the first to
use HTM inside a kernel.
2. Novel mechanism for cooperation between transactional and lock-based synchronization of a critical
region. The cooperative transactional spinlock (
cxspinlock) can be called from a transactional or non-transactional thread, and it exploits the greater
parallelism enabled by transactions.
3. Novel mechanism for handling I/O within transactions: transactions that perform I/O are restarted by
the hardware and acquire a conventional lock in
4. HTM mechanism to nearly eliminate priority inversion.
2. htm PRimeR
This section provides background on parallel programming with locks and gives an overview of programming with
2. 1. threads, synchronization, and locks
Current parallel programming practices rely heavily on the
thread abstraction. A thread is a sequential flow of control,
with a private program counter and call stack. Multiple
threads may share a single address space, allowing them
to communicate through memory using shared variables.
Threads make it possible for a single logical task to take advantage of multiple hardware instruction processors, for example, by moving subsets of the task to different processing
contexts and executing them in parallel. Threads allow an
application to remain responsive to users or get other work
done while waiting for input from a slow device such as a
disk drive or a human beings. Multiple processors are the
parallel computing resource at the hardware level, multiple
threads are the parallel computing resource at the operating
Threads require synchronization when sharing data or
communicating through memory to avoid race conditions.
A race condition occurs when threads access the same data
structure concurrently in a way that violates the invariants
of the data structure. For instance, a race condition between
two threads inserting into a linked list could create a loop
in the list. Synchronization is the coordination that eliminates race conditions and maintains data structure invariants (like every list is null terminated). Locks allow threads to
synchronize concurrent accesses to a data structure. A lock
protects a data structure by enforcing mutual exclusion, ensuring that only one thread can access that data structure at
a time. When a thread has exclusive access to a data structure, it is guaranteed not to see partially completed changes
made by other threads. Locks thus help maintain consistency over shared variables and resources.
Locks introduce many challenges into the programming model, such as deadlock and priority inversion.
Most importantly though, they are often a mismatch for
the programmer’s real needs and intent: a critical section
expresses a consistency constraint, while a lock provides
exclusion. Ultimately, when a programmer encloses a set
of instructions in a critical section, it represents the assessment that those instructions must be executed atomically
(either completely, or not at all), and in isolation (without
visible partial updates) in order to preserve the consistency
of the data manipulated by the critical section. HTM provides hardware and software support for precisely that
abstraction: atomic, isolated execution of critical sections.
Locks can provide that abstraction conservatively by ensuring that no two threads are ever executing in the critical section concurrently. By contrast, TM provides this abstraction
optimistically, by allowing concurrent execution of critical
sections, detecting violations of isolation dynamically, and
restarting one or more transactions in response, reverting state changes done in a transaction if the transaction
does not commit. The result is a globally consistent order
There are many lock variants, like reader/writer locks
and sequence locks. These lock variants reduce the amount
of exclusion for a given critical section which can improve
performance by allowing more threads to concurrently execute a critical region. However these variants can be used