to identify programming idioms which ensure that one can
reason in terms of a traditional interleaving model of concurrency, showing that any relaxed memory execution is
equivalent to one that is possible above a sequentially consistent memory model. One common idiom with this property is data-race freedom. Informally, a program has a data
race if multiple threads can access the same location (where
at least one is writing to the location) without a synchronization operation separating the accesses. Programs where
every shared access is in a critical section are one common
example of DRF programs.
A variety of relaxed models, both for processors and for
programming languages, have been proved to support sequentially consistent semantics for DRF programs. 8, 9, 10, 12, 16, 23
Saraswat et al. 30 call supporting sequentially consistent
semantics for DRF programs the “fundamental property”
of a relaxed memory model, and indeed memory models
have sometimes been defined in these terms. 6 However,
for a processor architecture, we prefer to define a memory
model that is applicable to arbitrary programs, to support
reasoning about low-level code, and have results about well-behaved programs as theorems above it.
The details of what constitutes a data race, or a synchronization operation, vary from model to model. For x86-TSO,
we define two events on different threads to be competing if
they access the same address, one is a write, and the other
is a read (for aligned x86 accesses, it is not necessary to consider write/write pairs as competing). We say that a program
is data race free if it is impossible for a competing read/write
pair to execute back-to-back. Critically, we require this property only of sequentially consistent executions (equivalently,
the x86-TSO executions where store buffers are always
flushed immediately after each write).
We have proved that x86-TSO supports interleaving
semantics for DRF programs. However, this theorem alone
is not often useful, because most programs do contain data
races at this level of abstraction. For example, the read in the
spin loop of Section 4’s spinlock races with the write in the
release. We have, therefore, identified an extended notion
of data race freedom that the spinlock code does satisfy,
and we have used it to prove that, for well-synchronized programs using the spinlock, every x86-TSO execution has an
equivalent sequentially consistent execution. 26
Thus, the relaxed nature of x86-TSO is provably not a
concern for low-level systems code that uses spinlocks to
synchronize. Extending this result to other synchronization
primitives, and to code compiled from high-level languages,
is a major topic for future work.
6. ReLateD WoRK
There is an extensive literature on relaxed memory models,
but most of it does not address x86. We touch here on some
of the most closely related work.
There are several surveys of weak memory models, including those by Adve and Gharachorloo, 6 Luchango, 23 and
Higham et al. 19 The latter, in particular, formalizes a range
of models, including a TSO model, in both operational and
axiomatic styles, and proves equivalence results. Their axiomatic TSO model is rather closer to the operational style
than ours is, and is idealized rather than x86-specific. Park
and Dill28 verify programs by model checking them directly
above TSO. Burckhardt and Musuvathi13, App. A also give operational and axiomatic definitions of a TSO model and prove
equivalence, but only for finite executions. Their models
treat memory reads and writes and barrier events, but lack
instruction semantics and LOCK’d instructions with multiple events that happen atomically. Hangal et al. 18 describe
the Sun TSOtool, checking the observed behavior of pseudo-randomly generated programs against a TSO model. Roy
et al. 29 describe an efficient algorithm for checking whether
an execution lies within an approximation to a TSO model,
used in Intel’s Random Instruction Test (RIT) generator.
Loewenstein et al. 22 describe a “golden memory model” for
SPARC TSO, somewhat closer to a particular implementation microarchitecture than the abstract machine we give in
Section 3, that they use for testing implementations. They
argue that the additional intensional detail increases the
effectiveness of simulation-based verification. Boudol and
Petri12 give an operational model with hierarchical write buffers (thereby permitting IRIW behaviors), and prove sequential consistency for DRF programs. Burckhardt et al. 14 define
an x86 memory model based on IWP. 4 The mathematical
form of their definitions is rather different to our axiomatic
and abstract-machine models, using rewrite rules to reorder
or eliminate memory accesses in sets of traces. Their model
validates the 10 IWP tests and also some instances of IRIW
(depending on how parallel compositions are associated),
so it will not coincide with x86-TSO or x86-CC. Saraswat et
al. 30 also define memory models in terms of local reordering,
and prove a DRF theorem, but focus on high-level languages.
7. concLusion
We have described x86-TSO, a memory model for x86 processors that does not suffer from the ambiguities, weaknesses,
or unsoundnesses of earlier models. Its abstract-machine
definition should be intuitive for programmers, and its
equivalent axiomatic definition supports the memevents
exhaustive search and permits an easy comparison with
related models; the similarity with SPARCv8 suggests x86-
TSO is strong enough to program above. This work highlights the clarity of mathematically rigorous definitions, in
contrast to informal prose, for subtle loose specifications.
We do not speak for any x86 vendor, and it is, of course,
entirely possible that x86-TSO is not a good description of
some existing or future x86 implementation (we would be
very interested to hear of any such example). Nonetheless, we
hope that this will clarify the semantics of x86 architectures
as they exist, for systems programmers, hardware developers,
and those working on the verification of concurrent software.
acknowledgments
We thank Luc Maranget for his work on memevents and
litmus, Tom Ridge, Thomas Braibant and Jade Alglave for
their other work on the project, and Hans Boehm, David
Christie, Dave Dice, Doug Lea, Paul Loewenstein, and Gil
Neiger for helpful remarks. We acknowledge funding from
EPSRC grants EP/F036345 and EP/H005633 and ANR grant
ANR-06-SETI-010-02.