primitives, as well as implicit guarantees of ordering among memory
accesses, usually specified as the architecture or language memory consistency model.
Prior to Java 5 (2004) and C11/
C++ 11, these languages could not be
reliably used (as specified by their standards) to fully express the required
memory ordering. Programmers and
custom library writers relied on assembly language and machine binary
codes to express such ordering.
Now with C11/C++ 11 and Java ( 5
and later), programmers can designate some variables as subject to
special ordering (volatiles in Java and
atomics in C11/C++ 11). In some cases, these designations can be heavy
handed in imposing order around
such variables even when not needed by the algorithms. Standard C11/
C++ 11 offers finer levels of ordering
that allow the programmer to specify
the desired order.
The implementations of such languages have varying overheads on different hardware platforms. Designers of nonblocking implementations
should take into account the choice of
high-level languages and their memory-ordering performance on the targeted hardware platforms.
Choice of algorithms. The combined
choice of data structures and algorithms is one issue where big compromises can be made to design a system
that meets its overall requirements.
Nonblocking algorithms vary in
their portability (for example, requirement of special hardware support), reliability (for example, whether or not
they are widely used), complexity (for
example, reasonable ease of implementation, maintenance, and modification), progress guarantees (for example, wait-free, and lock-free), and
memory safety and ABA-prevention
features (for example, compatibility or
incompatibility with certain methods).
The choice of algorithms is intertwined
with most of the issues discussed here.
The purpose of the following example
is to demonstrate the interactions
among nonblocking features and issues discussed in this article.
Consider a simplified example of a
kill-safe system that requires process-
Hardware support requirements
˲ No support. For example, the reading and writing of hazard pointers can
be nonatomic. 16
˲ Nonuniversal atomic operations
(such as fetch-and-add, test-and-set,
and atomic swap). Maurice Herlihy
showed it is impossible to design wait-free (and lock-free) implementations
of certain data types that can be operated on by an arbitrary number of concurrent threads using only such (
nonuniversal) operations. 7
˲ Compare-and-swap. Herlihy showed
that CAS is a universal operation and
can be used to design implementations of any data type with wait-free
operations without limitation on the
maximum number of threads that operate on it concurrently. Pointer-size
CAS may suffer from the ABA problem.
The classic solution to that problem
requires the use of wider atomic operations (for example, double-width load
and CAS primitives).
˲ The PAIR LL and SC (for example,
larx and stcx on the IBM Power
architecture). These were also shown
by Herlihy to be universal operations.
As already discussed, they are immune
to the ABA problem in some cases in
which CAS is susceptible. Therefore,
pointer-size LL/SC may suffice or entail
simpler code where double-width CAS
is needed in the absence of LL/SC.
˲ Hardware transaction memory.
Recently IBM (Blue Gene/Q, 21 System Z, 12
and Power3) and Intel11 architectures
are offering hardware support for arbitrary memory transactions. However, most of these HTM architectures
(except IBM System Z) are best effort
(that is, that they require programmers provide a nontransactional path
in case the hardware transactions
Note that if the number of threads
that can concurrently execute certain
operations is limited, nonuniversal
atomic operations may suffice to design wait-free and lock-free implementations. For example, wait-free single-producer or single-consumer FIFO
queue operations15 (by skipping the
appropriate locks in the two-lock algorithm) and single-updater sets with
lock-free lookup by multiple threads16
can be implemented with just atomic
loads and stores.
Choice of language and memory
ordering. In addition to the variety of
requirements of atomic operations,
nonblocking algorithms vary in their
requirements of memory-ordering
primitives. For example, in the Push
operation of the running LIFO-list example (in Figure 1), the write in line 2
must be ordered before the write (in
the case of a successful CAS) in line
3. Hardware platforms and high-level
programming languages offer explicit
figure 3. Design compromise.