greatest scheduling value to the OS. Given the small number
of scheduling priority values, ties in conflict priority will not
be rare, so os_prio next employs timestamp. This hybrid contention management policy induces a total order on transactions and is therefore livelock-free.
A single register in the Meta TM architecture allows the OS
to communicate its scheduling priorities to the contention
manager. TxLinux encodes a process’ dynamic scheduling
priority and scheduling policy into a single integer called the
conflict priority, which it writes to a privileged status register
during the process of scheduling the process. The register
can only be written by the OS so the user code cannot change
the value. The value of the register does not change during
a scheduling quantum. For instance, the scheduling policy
might be encoded in the upper bits of the conflict priority
and the scheduling priority in the lower bits. An 8-bit value is
sufficient to record the policies and priority values of Linux
processes. Upon detecting a conflict, the os_prio contention
manager favors the transaction whose conflict priority value
is the largest.
The os_prio policy is free from deadlock and livelock because the conflict priority is computed before the xbegin
instruction is executed, and the OS never changes the conflict priority during the lifetime of the transaction (some designs allow transactions to continue for multiple scheduling
quanta). When priorities are equal, os_prio defaults to
SizeMatters, which defaults to timestamp when read–write set
sizes are equal. Hence, the tuple (conflict priority, size, age)
induces a total order, making the os_prio policy free of deadlock and livelock.
6. eValuation
Linux and TxLinux versions 2. 6. 16. 1 run on the Simics machine simulator version 3.0.27. In the following experiments,
Simics models an x86 SMP machine with 16 and 32 processors and an IPC of one instruction per cycle. The memory hierarchy uses per-processor split instruction and data caches
(16KB with 4-way associativity, 64-byte cache lines, 1-cycle
cache hit and 16-cycle miss penalties). The level one data
cache has extra tag bits to manage transactional data. There
is a unified second level cache that is 4 MB, 8-way associative,
with 64-byte cache lines, and a 200 cycle miss penalty to main
memory. Coherence is maintained with a transactional MESI
snooping protocol, and main memory is a single shared 1 GB.
The disk device models PCI bandwidth limitations, DMA
data transfer, and has a fixed 5. 5 ms access latency. All benchmarks are scripted, requiring no user interaction.
6. 1. Workloads
We evaluated TxLinux-SS and TxLixux-CX on a number of application benchmarks. Where P denotes the number of processors, these are:
• pmake (make -j P to compile part of libFLAC source
tree)
• MAB (modififed andrew benchmark, P instances, no
compile phase)
• configure (configure script for a subset of TeTeX, P
instances)
• find (Search 78MB directory ( 29 dirs, 968 files), P
instances)
• bonnie++ (models filesystem activity of a web cache, P
instances)
• dpunish (a filesystem stress test, with operations split
across P processes)
It is important to note that none of these benchmarks uses
transactions directly; the benchmarks exercise the kernel,
which in turn, uses transactions for synchronization.
6. 2. Performance
Figure 3 shows the synchronization performance for Linux,
Tx-Linux-SS (using bare transactions) and TxLixux-CX (
using cxspin-locks) for 16 CPUs, broken down into time spent
spinning on locks and time spent aborting and retrying
transactions. Linux spends between 1% and 6% of kernel
time synchronizing, while TxLinux spends between 1% and
12% of kernel time synchronizing. However, on average, TxLinux reduces synchronization time by 34% and 40% with
transactions and cxspinlocks, respectively. While HTM
generally reduces synchronization overhead, it more than
double the time lost for bonnie++. This loss is due to transactions that restart, back-off, but continue to fail. Since
bonnie++ does substantial creation and deletion of small files
in a single directory, the resulting contention in file system
code paths results in pathological restart behavior in the
file system code that handles updates to directories. High
contention on kernel data structures causes a situation in
which repeated back-off and restart effectively starves a few
transactions. Using back-off before restart as a technique to
handle such high contention may be insufficient for complex systems: the transaction system may need to queue
transactions that consistently do not complete.
Averaged over all benchmarks, TxLinux-SS shows a 2% slowdown over Linux for 16 CPUs and a 2% speedup for 32 CPUs.
The slowdown in the 16 CPU case results from the pathological restart situation in the bonnie++ benchmark, discussed
above; the pathology is not present in the 32 CPU case with
Linux
TxLinux-xs
TxLinux-cx
Linux
TxLinux-xs
TxLinux-cx
Linux
TxLinux-xs
TxLinux-cx
Linux
TxLinux-xs
TxLinux-cx
Linux
TxLinux-xs
TxLinux-cx
Linux
TxLinux-xs
TxLinux-cx
figure 3: synchronization performance for 16 cPus, txlinux-ss, and
txlinux-cX.
Percent of Kernel time spent synchronizing
14
12
10 aborts
spins
8
6
4
2
0