a Lock-free stack
As noted, a drawback of our lock-based
implementation, and in fact, of lock-based algorithms in general, is that the
scheduler must guarantee that threads
are preempted infrequently (or not at
all) while holding the locks. Otherwise,
other threads accessing the same locks
will be delayed, and performance will
suffer. This dependency on the capriciousness of the scheduler is particularly problematic in hard real-time systems where one requires a guarantee
on how long method calls will take to
complete.
We can eliminate this dependency by
designing a lock-free stack implementation. 23 In the LockFreeStack<T>,
instead of acquiring a lock to manipulate the stack, threads agree who can
modify it by directly applying a CAS to
the top variable. To do so, we only need
to modify the code for the tryPush()
and tryPop() methods, as in Figure 3.
As before, if unsuccessful, the method
calls are repeated after backing off, just
as in the lock-based algorithm.
A quick analysis shows the completion of a push (respectively pop) method call cannot be delayed by the preemption of some thread: the stack’s state is
changed by a single CAS operation that
either completes or not, leaving the
stack ready for the next operation. Thus,
a thread can only be delayed by scheduling infinitely many calls that successfully modify the top of the stack and cause
the tryPush() to continuously fail. In
other words, the system as a whole will
always make progress no matter what
the scheduler does. We call this form
of progress lock-freedom. In many data
structures, having at least some of the
structure’s methods be lock-free tends
to improve overall performance.
It is easy to see that the lock-free
stack is linearizable: it behaves like a
sequential stack whose methods “take
effect” at the points in time where their
respective CAS on the top variable succeeded (or threw the exception in case of
a pop on an empty stack). We can thus
compose this stack with other linearizable objects without worrying about the
implementation details: as far as safety
goes, there is no difference between the
lock-based and lock-free stacks.
an elimination Backoff stack
Like the lock-based stack, the lock-free
i expect the
greatest change
we will see is
that concurrent
data structures
will go through
a substantiative
“relaxation
process.”
stack implementation scales poorly,
primarily because its single point of
access forms a sequential bottleneck:
method calls can proceed only one
after the other, ordered by successful
CAS calls applied to the stack’s lock
or top fields. A sad fact we should acknowledge is this sequential bottleneck is inherent: in the worst case it
takes a thread at least Ω (n) steps and/or
stalls (recall, a stall is the delay a thread
incurs when it must wait for another
thread taking a step) to push or pop a
linearizable lock-free stack. 9 In other
words, the theory tells us there is no
way to avoid this bottleneck by distributing the stack implementation over
multiple locations; there will always be
an execution of linear complexity.
Surprisingly, though, we can introduce parallelism into many of the common case executions of a stack implementation. We do so by exploiting the
following simple observation: if a push
call is immediately followed by a pop
call, the stack’s state does not change;
the two calls eliminate each other and
it is as if both operations never happened. By causing concurrent pushes
and pops to meet and pair up in separate memory locations, the thread calling push can exchange its value with a
thread calling pop, without ever having
to access the shared lock-free stack.
As depicted in Figure 4, in the
EliminationBackoffStack<T> 11
one achieves this effect by adding an
EliminationArray to the lock-free
stack implementation. Each location
in the array is a coordination structure
called an exchanger, 16, 18 an object that allows a pair of threads to rendezvous and
exchange values.
Threads pick random array entries
and try to pairup with complementary
operations. The calls exchange values
in the location in which they met, and
return. A thread whose call cannot be
eliminated, either because it has failed
to find a partner, or because it found a
partner with the wrong type of method
call (such as a push meeting a push),
can either try again to eliminate at a
new location, or can access the shared
lock-free stack. The combined data
structure, array and stack, is linearizable because the lock-free stack is linearizable, and we can think of the eliminated calls as if they occurred at the
point in which they exchanged values.