as written. Depending on the hardware, this code can still fail.
The problem this time is that the
hardware may optimize the blue
thread. Nearly all processor architectures allow stores to memory to be
saved in a buffer visible only to that
processor core before writing them
to memory visible to other processor
cores. 2 Some, such as the ARM chip
that is probably in your smartphone,
allow the stores to become visible to
other processor cores in a different
order. On such a processor the blue
thread’s write to done may become
visible to the red thread, running on
another core, before the blue thread’s
write to x. Thus, the red thread may
see done set to true, and the loop
may terminate before it can retrieve
the proper value of x. Thus, when the
red thread accesses x, it may still get
the uninitialized value.
Unlike the original problem of
reading done once outside the loop,
this problem will occur infrequently,
and may well be missed during testing.
Again the core problem here is that
although the done flag is intended to
prevent simultaneous accesses to x, it
can itself be simultaneously accessed
by both threads. And data races are
evil!
be zero when they both finished. Although the original program was well
behaved and had no data races, the
compiler added an implicit update to
b2 that created a data race.
This kind of data-race insertion
has been clearly disallowed in Java for
a long time. The recently published
C++ 11 and C11 standards also disallow it. We know of no Java implementations with such problems, nor do
modern C and C++ compilers generally exhibit precisely this problem. Unfortunately, many do introduce data
races under certain obscure, unlikely,
and unpredictable conditions. This
problem will disappear as C++ 11 and
C11 become widely supported.
For C and C++, the story for bit-fields is slightly more complicated.
We’ll discuss that more, later.
Bits and Bytes
So far, we have talked only about data
races in which two threads access exactly the same variable, or object field,
at the same time. That has not always
been the only concern. According to
some older standards, when you declare two small fields b1 and b2 next
to each other, for example, then updating b1 could be implemented with
the following steps:
1. Load the machine word containing both b1 and b2 into a machine register.
2. Update the b1 piece in the machine register.
3. Store the register back to the location from which it was loaded.
Unfortunately, if another thread
updates b2 just before the last step,
then that update is overwritten by
the last step and effectively lost. If
both fields were initially zero, and
one thread executed b1 = 1, while the
other executed b2 = 1, b2 could still
and the Real Rules are…
The simplest view of threads, and the
one we started with, is that a multithreaded program is executed by interleaving steps from each thread.
Logically the computer executes a step
from one thread, then picks another
thread, or possibly the same one, executes its next step, and so on. This is
a sequentially consistent execution.
As already shown, real machines
and compilers sometimes result in
non-sequentially consistent execu-
tions: for example, when the assign-
ment to a variable and a done flag are
made visible to other threads out of
order. Sequential consistency, how-
ever, is critical in understanding the
behavior of real shared variables, for
two reasons:
figure 1. Two interleaved multi-word increments.
tmp_hi = x_hi;
tmp_lo = x_lo;
(tmp_hi, tmp_lo)++; // tmp_hi = 1, tmp_lo = 0
x_hi = tmp_hi; // x_hi = 1, x_lo = 999, x = 1999
x++; // red runs all steps
// x_hi = 2, x_lo = 0, x = 2000
x_lo = tmp_lo; // x_hi = 2, x_lo = 0
figure 2. Waiting on a flag.
Blue Thread
x = ...;
done = true;
other Threads
while (!done) {}
... = x;