It is lock-free because we can easily
implement a lock-free exchanger using
a CAS operation, and the shared stack
itself is already lock-free.
In the EliminationBackoff-Stack, the Elimination Array is
used as a backoff scheme to a shared
lock-free stack. Each thread first accesses the stack, and if it fails to complete its call (that is, the CAS attempt
on top fails) because there is contention, it attempts to eliminate using
the array instead of simply backing off
in time. If it fails to eliminate, it calls
the lockfree stack again, and so on. A
thread dynamically selects the subrange of the array within which it tries
to eliminate, growing and shrinking it
exponentially in response to the load.
Picking a smaller subrange allows a
greater chance of a successful rendezvous when there are few threads, while
a larger range lowers the chances of
threads waiting on a busy Exchanger
when the load is high.
In the worst case a thread can still
fail on both the stack and the elimi-
nation. However, if contention is low,
threads will quickly succeed in access-
ing the stack, and as it grows, there will
be a higher number of successful elim-
inations, allowing many operations to
complete in parallel in only a constant
number of steps. Moreover, contention
at the lock-free stack is reduced be-
cause eliminated operations never ac-
cess the stack. Note that we described
a lock-free implementation, but, as
with many concurrent data structures,
on some systems a lock-based imple-
mentation might be more fitting and
deliver better performance.
an elimination tree
A drawback of the elimination backoff
stack is that under very high loads the
number of un-eliminated threads accessing the shared lock-free stack may
remain high, and these threads will continue to have linear complexity. Moreover, if we have, say, bursts of push calls
followed by bursts of pop calls, there
will again be no elimination and therefore no parallelism. The problem seems
to be our insistence on having a linearizable stack: we devised a distributed solution that cuts down on the number of
stalls, but the theoretical worst case linear time scenario can happen too often.
This leads us to try an alternative
approach: relaxing the consistency
condition for the stack. Instead of a
linearizable stack, let’s implement a
quiescently consistent one. 4, 14 A stack
is quiescently consistent if in any execution, whenever there are no ongoing
push and pop calls, it meets the LIFO
stack specification for all the calls that
preceded it. In other words, quiescent
consistency is like a game of musical
chairs, we map the object to the sequential specification when and only
figure 5. a Tree[ 4] network leading to four lock-free stacks.
53
wire 0
1
lock-free
balancer
51
top
lock-free stack
1
5
4
2
3
1
2
1
42
wire 1
3
0
4
threads pushing items arrive at the balancers in the order of their numbers, eventually pushing items onto
the stacks located on their output wires. In each balancer, a pushing thread fetches and then complements the bit, following the wire indicated by the fetched value (If the state is 0 the pushing thread it
will change it to 1 and continue to wire 0, and if it was 1 will change it to 0 and continue on wire 1). the
tree and stacks will end up in the balanced state seen in the figure. the state of the bits corresponds to
5 being the last item, and the next location a pushed item will end up on is the lock-free stack containing
item 2. try it! a popping thread does the opposite of the pushing one: it complements the bit and follows
the complemented value. thus, if a thread executes a pop in the depicted state, it will end up switching a
1 to a 0 at the top balancer, and leave on wire 0, then reach the top 2nd level balancer, again switching a 1
to a 0 and following its 0 wire, ending up popping the last value 5 as desired. this behavior will be true for
concurrent executions as well: the sequences of values in the stacks in all quiescent states can be shown
to preserve lIFo order.
when the music stops. As we will see,
this relaxation will nevertheless provide quite powerful semantics for the
data structure. In particular, as with
linearizability, quiescent consistency
allows objects to be composed as black
boxes without having to know anything
about their actual implementation.
Consider a binary tree of objects
called balancers with a single input wire
and two output wires, as depicted in Figure 5. As threads arrive at a balancer, it
repeatedly sends them to the top wire
and then the bottom one, so its top wire
always has one more thread than the
bottom wire. The Tree[k] network is
a binary tree of balancers constructed
inductively by placing a balancer before
two Tree[k/2] networks of balancers
and not shuffling their outputs. 22
We add a collection of lock-free
stacks to the output wires of the tree.
To perform a push, threads traverse the
balancers from the root to the leaves and
then push the item onto the appropriate stack. In any quiescent state, when
there are no threads in the tree, the output items are balanced out so that the
top stacks have at most one more than
the bottom ones, and there are no gaps.
We can implement the balancers in
a straightforward way using a bit that
threads toggle: they fetch the bit and
then complement it (a CAS operation),
exiting on the output wire they fetched
(zero or one). How do we perform
a pop? Magically, to perform a pop
threads traverse the balancers in the
opposite order of the push, that is, in
each balancer, after complementing
the bit, they follow this complement,
the opposite of the bit they fetched.
Try this; you will see that from one
quiescent state to the next, the items
removed are the last ones pushed onto
the stack. We thus have a collection of
stacks that are accessed in parallel, yet
act as one quiescent LIFO stack.
The bad news is that our implementation of the balancers using a bit
means that every thread that enters the
tree accesses the same bit in the root
balancer, causing that balancer to become a bottleneck. This is true, though
to a lesser extent, with balancers lower
in the tree.
We can parallelize the tree by exploiting a simple observation similar
to one we made about the elimination
backoff stack: