it is an interface property, the memory
model decision has a long-lasting impact, affecting portability and maintainability of programs. Thus, a hardware
architecture committed to a strong
memory model cannot later forsake it
for a weaker model without breaking binary compatibility, and a new compiler
release with a weaker memory model
may require rewriting source code. Finally, memory-model-related decisions
for a single component must consider
implications for the rest of the system.
A processor vendor cannot guarantee
a strong hardware model if the memory system designer provides a weaker
model; a strong hardware model is not
very useful to programmers using languages and compilers that provide only
a weak guarantee.
Nonetheless, the central role of the
memory model has often been down-played. This is partly because formally
specifying a model that balances all
desirable properties of programmability, performance, and portability has
proven surprisingly complex. At the
same time, informal, machine-specific
descriptions proved mostly adequate in
an era where parallel programming was
the domain of experts and achieving the
highest possible performance trumped
programmability or portability arguments.
In the late 1980s and 1990s, the area
received attention primarily in the hardware community, which explored many
approaches, with little consensus. 2
Commercial hardware memory model
descriptions varied greatly in precision,
including cases of complete omission
of the topic and some reflecting vendors’ reluctance to make commitments
with unclear future implications. Although the memory model affects the
meaning of every load instruction in every multithreaded application, it is still
sometimes relegated to the “systems
programming” section of the architecture manual.
Part of the challenge for hardware
architects was the lack of clear memory
models at the programming language
level. It was unclear what programmers
expected hardware to do. Although
hardware researchers proposed approaches to bridge this gap, 3
widespread adoption required consensus
from the software community. Before
2000, there were a few programming
key insights
memory models, which describe the
semantics of shared variables, are
crucial to both correct multithreaded
applications and the entire underlying
implementation stack. it is difficult
to teach multithreaded programming
without clarity on memory models.
after much prior confusion, major
programming languages are converging
on a model that guarantees simple
interleaving-based semantics for
“data-race-free” programs and most
hardware vendors have committed to
support this model.
This process has exposed fundamental
shortcomings in our languages and
a hardware-software mismatch.
Semantics for programs that contain
data races seem fundamentally difficult,
but are necessary for concurrency
safety and debuggability. We call upon
software and hardware communities
to develop languages and systems
that enforce data-race-freedom, and
co-designed hardware that exploits and
supports such semantics.
environments that addressed the issue with relative clarity, 40 but the most
widely used environments had unclear
and questionable specifications. 9, 32
Even when specifications were relatively clear, they were often violated to obtain sufficient performance, 9 tended to
be misunderstood even by experts, and
were difficult to teach.
Since 2000, we have been involved in
efforts to cleanly specify programming-language-level memory models, first
for Java and then C++, with efforts now
under way to adopt similar models for C
and other languages. In the process, we
had to address issues created by hardware that had evolved without the benefit of a clear programming model. This
often made it difficult to reconcile the
need for a simple and usable programming model with that for adequate performance on existing hardware.
Today, these languages and most
hardware vendors have published (or
plan to publish) compatible memory
model specifications. Although this
convergence is a dramatic improve-
ment over the past, it has exposed fun-
damental shortcomings in our parallel
languages and their interplay with hard-
ware. After decades of research, it is still
unacceptably difficult to describe what
value a load can return without com-
promising modern safety guarantees or
implementation methodologies. To us,
this experience has made it clear that
solving the memory model problem will
require a significantly new and cross-
disciplinary research direction for par-
allel computing languages, hardware,
and environments as a whole.
Sequential consistency
A natural view of the execution of a
multithreaded program operating on
shared variables is as follows. Each
step in the execution consists of choosing one of the threads to execute, and
then performing the next step in that
thread’s execution (as dictated by the
thread’s program text, or program order). This process is repeated until the
program as a whole terminates. Effectively, the execution can be viewed as
taking all the steps executed by each
thread, and interleaving them in some
way. Whenever an object (that is, variable, field, or array element) is accessed,
the last value stored to the object by this
interleaved sequence is retrieved.
For example, consider Figure 1,
which gives the core of Dekker’s mutual
exclusion algorithm. The program can
be executed by interleaving the steps
from the two threads in many ways. Formally, each of these interleavings is a
total order over all the steps performed
by all the threads, consistent with the
program order of each thread. Each access to a shared variable “sees” the last
prior value stored to that variable in the
interleaving.
Figure 2 gives three possible executions that together illustrate all possible