technical Perspective
transactional memory
in the operating system
By Mark Moir
THE LONG TRADITION of building ever-faster processors is ending, with the
computer industry instead putting
more processing “cores” on each processor chip. Therefore, to continue to
benefit from advances in technology,
applications must increasingly be able
to perform useful work on multiple
cores at the same time.
To use multiple cores concurrently,
programmers must not only identify
independent pieces of a task that can
be executed in parallel, but also coordinate their execution, managing
communication and synchronization
between them. A program with a communication or synchronization
bottleneck will be unable to take full advantage of the available cores. Scalable
programs that avoid such bottlenecks
are surprisingly difficult to construct.
The complex interactions between
concurrently executing pieces of code
are often managed by the careful use
of locks to ensure a critical section of
code on one core executes “
atomically,” that is, without interference from
other cores. However, it is challenging
to avoid synchronization bottlenecks
with this approach, and in many cases
the overhead of such bottlenecks negates the advantage gained by running
pieces of the application concurrently.
Research in Transactional Memory
(TM) has allowed programmers to express that a critical section of code
should be executed atomically, without
specifying how this should be achieved.
The term “Transactional Memory”
originated with seminal work by Herlihy and Moss (ISCA ‘ 93), in which they
proposed to apply the idea of “
transactions” (already popular in database
systems) to computer memory.
Numerous variations on this theme
have emerged in the literature, with
some proposing TM be supported entirely in hardware, others entirely in
software, and yet others using some
combination of the two. Some propose extensions to programming lan-
guages, implemented with a combination of compiler support and runtime
libraries, while others propose library
interfaces, and yet others propose instruction set extensions that are programmed directly.
TM is generally “optimistic.” Rather
than pessimistically preventing critical
sections from executing concurrently
just in case they interfere with others,
as locks do, TM allows them to execute
concurrently by default. The TM system monitors concurrent critical sections to detect any “conflict” between
them. When conflicts do arise, the TM
system forces one or more of the critical sections to either wait or roll back,
so that no interference occurs. This is
done automatically by the TM system.
To date, most research has focused
on using TM in user programs, and
emerging TM proposals have mostly
been evaluated using rather artificial
toy applications and benchmarks designed to explore the various trade-offs
that arise in TM design.
In contrast, the research described
here by Rossbach et al. explores the
use of TM in an operating system. Such
research is important for a number of
reasons. First, its provides significant
value to the TM research community by
creating TM-based programs (
operating systems) that are larger, more complex, and more realistic than the mostly
artificial workloads used for evaluating
TM systems so far. Second, TM has the
potential to improve the performance
and scalability of the operating system
itself, even while making it simpler. Because all user programs depend on the
operating system, its scalability and
correctness is critical to the success of
multicore systems.
TM research for user programs will
not necessarily result in the best TM
support for operating systems. On the
one hand, the operating system faces
some challenges that user programs do
not. In particular, the operating system
must perform reliably while running a
variety of user programs simultaneously andthus cannot exploit knowledge of
specific ones to improve performance
and scalability. Furthermore, operating
systems employ a wider variety of synchronization mechanisms than typical
user programs do, and must also deal
with a variety of complicated issues on
behalf of user programs, such as access
to low-level system devices.
However, the operating system can
exploit its privileged role in the system
in ways that user programs cannot. For
example, for security reasons, a user
program cannot prevent the operating
system from interrupting it, while the
operating system has full control of the
cores and can thus prevent interruption while finishing an important task.
The authors built two operating
systems that exploit TM. They first attempted to use transactions directly
instead of existing synchronization
mechanisms. This approach entailed a
number of challenges, particularly related to the fact that locks are used for
purposes other than preventing memory conflicts between critical sections,
such as protecting system devices for
input and output (I/O). TM systems are
not yet mature enough to support such
functionality, and the lessons learned
here are valuable for TM researchers.
The authors then took a more pragmatic approach: they retained existing synchronization mechanisms, and
used TM to attempt to improve their
performance and scalability. This way,
they could use the existing locks when
necessary—for example, for I/O—while
also exploiting TM in other cases to allow parallelism that would previously
have been impossible. Given the thousands of person-years spent carefully
engineering modern operating systems, and the limited nature of the first
TM systems, this is likely to be the most
practical approach to exploiting TM in
operating systems for some time.
By exploring both approaches, the
authors not only contribute to our understanding of how TM can be used in
large and realistic code bases, but also
provide valuable insight into what additional TM features may be useful or
necessary for optimal support of such
systems in the longer term.
Mark Moir ( Mark.Moir@sun.com) is a senior staff engineer
at Sun Microsystems Laboratories, Burlington, MA.