bare transactions, and without bonnie++, TxLinux-SS shows
the same speedup for 16 CPUs as 32 CPUs. TxLixux-CX sees
2.5% and 1% speedups over Linux for 16 and 32 CPUs. These
performance deltas are negligible and do not demonstrate a
conclusive performance increase. However, the argument for
HTM in an operating system is about reducing programming
complexity. These results show that HTM can enhance pro-grammability without a negative impact on performance.
6. 3. Priority inversion performance
Figure 4 shows how frequently transactional priority inversion occurs in TxLinux. In this case, priority inversion means
that the default SizeMatters contention management policy21
favors the process with the lower OS scheduling priority when
deciding the winning transaction in a conflict. Results for
timestamp-based contention management are similar. Most
benchmarks show that a significant percentage of transactional conflicts result in a priority inversion, with the average 9.5% across all kernel and CPU configurations we tested,
with as has 25% for find. Priority inversion tends to decrease
with larger numbers of processors, but the trend is not strict.
The pmake and bonnie++ benchmarks show an increase
with higher processor count. The number and distribution
of transactional conflicts is chaotic, so changing the number
of processors can change the conflict behavior. The os_prio
contention management policy eliminates priority inversion
entirely in our benchmarks, at a performance cost under 1%.
By contrast, techniques for ameliorating priority inversion
with locks such as priority inheritance only provide an upper
bound on priority inversion, and require taking the performance hit of turning polling locks into blocking locks.
The frequency with which naïve contention management
violates OS scheduling priority argues strongly for a mechanism that lets the OS participate in contention management,
e.g., by communicating hints to the hardware.
7. RelateD WoRK
Due to limited space, we refer the interested reader to the
complete discussions, 21, 23 and survey the related literature in
figure 4: Percentage of transaction restarts decided in favor of a
transaction started by the processor with lower process priority,
resulting in “transactional” priority inversion. Results shown are for
all benchmarks, for 16 and 32 processors, txlinux-ss.
Transactional priority inversion
30
25
20
15
10
5
0
+
n
+
b
o
n
e
i
m
a
b
m
p
a
k
e
d
u
n
p
s
h
i
n
c
g
o
f
i
n
d
f
i
16 cpus
32 cpus
brief. Larus and Rajwar provide a thorough reference on TM
research through the end of 2006.12
HTM. Herlihy and Moss10 gave one of the earliest designs
for HTM; many proposals since have focused on architectural mechanisms to support HTM, 5, 8, 13, 14, 25 and language-level support for HTM. Some proposals for TM virtualization (when transactions overflow hardware resources)
involve the OS, 2, 5 but no proposals to date have allowed the
OS itself to use transactions for synchronization. This paper, however, examines the systems issues that arise when
using HTM in an OS and OS support for HTM. Rajwar and
Goodman explored speculative19 and transactional20 execution of critical sections. These mechanisms for falling back
on locking when isolation is violated are similar to (but less
general than) the cxspinlock technique of executing in a
transactional context and reverting to locking when I/O is
detected.
I/O in transactions. Proposals for I/O in transactions fall
into three basic camps: give transactions an isolation escape
hatch, 15–17 delay the I/O until the transaction commits, 8, 9 and
guarantee that the thread performing I/O will commit. 2, 8 All
of these strategies have serious drawbacks. 11. Escape hatch-es introduce complexity and correctness conditions that
restrict the programming model and are easy to violate in
common programming idioms. Delaying I/O is not possible
when the code performing the I/O depends on its result, e.g.,
a device register read might return a status word that the OS
must interpret in order to finish the transaction. Finally,
guaranteeing that a transaction will commit severely limits scheduler flexibility, and can, for long-running or highly
contended transactions, result in serial bottlenecks or deadlock. Non-transactional threads on other processors which
conflict the guaranteed thread will be forced to stall until the
guaranteed thread commits its work. This will likely lead to
lost timer interrupts and deadlock in the kernel.
Scheduling. Operating systems such as Microsoft Windows, Linux, and Solaris implement sophisticated, priority-based, pre-emptive schedulers that provide different classes
of priorities and a variety of scheduling techniques for each
class. The Linux RT patch supports priority inheritance to
help mitigate the effects of priority inversion: while our
work also addresses priority inversion, the Linux RT patch
implementation converts spinlocks to mutexes. While these
mechanisms guarantee an upper bound on priority inversion, the os_prio policy allows the contention manager to effectively eliminate priority inversion without requiring the
primitive to block or involve the scheduler.
8. conclusion
This paper is the first description of an operating system
that uses HTM as a synchronization primitive, and presents innovative techniques for HTM-aware scheduling and
cooperation between locks and transactions. TxLinux demonstrates that HTM provides comparable performance to
locks, and can simplify code while coexisting with other
synchronization primitives in a modern OS. The cxspinlock
primitive enables a solution to the long-standing problem of
I/O in transactions, and the API eases conversion from locking primitives to transactions significantly. Introduction of