Doi: 10.1145/1785414.1785443
x86-TSO: A Rigorous and
Usable Programmer’s Model
for x86 Multiprocessors
abstract
Exploiting the multiprocessors that have recently become
ubiquitous requires high-performance and reliable concurrent systems code, for concurrent data structures, operating
system kernels, synchronization libraries, compilers, and
so on. However, concurrent programming, which is always
challenging, is made much more so by two problems. First,
real multiprocessors typically do not provide the sequentially consistent memory that is assumed by most work
on semantics and verification. Instead, they have relaxed
memory models, varying in subtle ways between processor families, in which different hardware threads may have
only loosely consistent views of a shared memory. Second,
the public vendor architectures, supposedly specifying what
programmers can rely on, are often in ambiguous informal
prose (a particularly poor medium for loose specifications),
leading to widespread confusion.
In this paper we focus on x86 processors. We review several recent Intel and AMD specifications, showing that all
contain serious ambiguities, some are arguably too weak to
program above, and some are simply unsound with respect
to actual hardware. We present a new x86-TSO programmer’s
model that, to the best of our knowledge, suffers from none
of these problems. It is mathematically precise (rigorously
defined in HOL4) but can be presented as an intuitive abstract
machine which should be widely accessible to working programmers. We illustrate how this can be used to reason
about the correctness of a Linux spinlock implementation
and describe a general theory of data-race freedom for x86-
TSO. This should put x86 multiprocessor system building on
a more solid foundation; it should also provide a basis for
future work on verification of such systems.
1. intRoDuction
Multiprocessor machines, with many processors acting on a
shared memory, have been developed since the 1960s; they are
now ubiquitous. Meanwhile, the difficulty of programming
concurrent systems has motivated extensive research on
programming language design, semantics, and verification,
from semaphores and monitors to program logics, software
model checking, and so forth. This work has almost always
assumed that concurrent threads share a single sequentially
consistent memory, 21 with their reads and writes interleaved
in some order. In fact, however, real multiprocessors use
sophisticated techniques to achieve high performance: store
buffers, hierarchies of local cache, speculative execution,
etc. These optimizations are not observable by sequential
code, but in multithreaded programs different threads may
see subtly different views of memory; such machines exhibit
relaxed, or weak, memory models. 6, 7, 17, 19
Proc 0
Proc 1
Mov [x]¬ 1
Mov eAX¬[y]
Mov [y]¬ 1
Mov ebX¬[x]
Allowed Final State: Proc 0:eAX=0 ∧ Proc 1:ebX=0
Microarchitecturally, one can view this particular example
as a visible consequence of store buffering: if each processor effectively has a FIFO buffer of pending memory writes
(to avoid the need to block while a write completes), then the
reads from y and x could occur before the writes have propagated from the buffers to main memory.
Other families of multiprocessors, dating back at least
to the IBM 370, and including ARM, Itanium, POWER, and
SPARC, also exhibit relaxed-memory behavior. Moreover,
there are major and subtle differences between different processor families (arising from their different internal design
choices): in the details of exactly what non-sequentially-con-sistent executions they permit, and of what memory barrier
and synchronization instructions they provide to let the programmer regain control.
For any of these processors, relaxed-memory behavior
exacerbates the difficulties of writing concurrent software,
as systems programmers cannot reason, at the level of
abstraction of memory reads and writes, in terms of an intuitive concept of global time.
This paper is based on work that first appeared in the
Proceedings of the 36th SIGPLAN-SIGACT Symposium on
Principles of Programming Languages (POPL), 2009, and
in the Proceedings of the 22nd International Conference on
Theorem Proving in Higher-Order Logics (TPHOLs), 2009.