The three main conditional branches (beginning at lines 07,
17, and 26) correspond to the type of node we find.
The first case occurs when the stack is empty or contains
only data (line 07). We attempt to insert a new datum (09),
and wait for a consumer to claim that datum ( 11–12) before
returning. The reservation linearizes in this code path when
we push our datum at line 09; a successful follow-up linearizes when we notice that our data has been taken at line 11.
The second case occurs when the stack contains (only)
reservations ( 17). We attempt to place a fulfilling datum on
the top of the stack ( 19); if we succeed, any other thread that
wishes to perform an operation must now help us fulfill the
request before proceeding to its own work. We then read our
way down the stack to find the successor node to the reservation we are fulfilling ( 21–22) and mark the reservation
fulfilled ( 23). Note that our CAS could fail if another thread
helps us and performs it first. Finally, we pop both the reservation and our fulfilling node from the stack ( 24) and return.
The reservation linearizes in this code path at line 19, when
we push our fulfilling datum above a reservation; the follow-up linearization point occurs immediately thereafter.
The remaining case occurs when we find another thread’s
fulfilling datum or reservation ( 26) at the top of the stack.
We must complete the pairing and annihilation of the top
two stack nodes before we can continue our own work. We
first read our way down the stack to find the data or reservation for which the fulfilling node is present ( 27–28) and then
we mark the underlying node as fulfilled ( 29) and pop the
paired nodes from the stack ( 30).
Referring to Figure 2, when a consumer wishes to retrieve
data from an empty stack, it first must insert a reservation
(step A). It then waits until its data pointer (branching to the
right) is non-null. Meanwhile, if a producer appears, it satisfies
the consumer in a two-step process. First (step B), it pushes a
fulfilling data node at the top of the stack. Then, it swings the
reservation’s data pointer to its fulfilling node (step c). Finally,
it updates the top-of-stack pointer to match the reservation
node’s next pointer (step d, not shown). After the producer
has completed step B, other threads can help update the reservation’s data pointer (step c); and the consumer thread can
additionally help remove itself from the stack (step d).
time-out: Although the algorithms presented in
the sections “The Synchronous Dual Queue” and “The
figure 2: synchronous dual stack: satisfying a reservation.
Top
Top
Top
A
B
Fulfill
data
Fulfill
data
Reserv.
Reserv.
Reserv.
C
Synchronous Dual Stack” are complete implementations
of synchronous queues, real systems require the ability to
specify limited patience so that a producer (or consumer)
can time out if no consumer (producer) arrives soon enough
to pair up. As noted earlier, Hanson’s synchronous queue
offers no simple way to do this. Space limitations preclude
discussion of the relatively straightforward manner in
which we add time-out support to our synchronous queue;
interested readers may find this information in our original
publication. 17
pragmatics: Our synchronous queue implementations
reflect a few additional pragmatic considerations to maintain good performance. First, because Java does not allow
us to set flag bits in pointers (to distinguish among the
types of pointed-to nodes), we add an extra word to nodes,
in which we mark mode bits. We chose this technique over
two primary alternatives. The class java.util.concurrent.
AtomicMarkableReference allows direct association of tag bits
with a pointer, but exhibits very poor performance. Using
runtime type identification (RTTI) to distinguish between
multiple subclasses of the Node classes would similarly
allow us to embed tag bits in the object type information.
While this approach performs well in isolation, it increases
long-term pressure on the JVM’s memory allocation and garbage collection routines by requiring construction of a new
node after each contention failure.
Time-out support requires careful management of memory ownership to ensure that canceled nodes are reclaimed
properly. Automatic garbage collection eases the burden in
Java. We must, however, take care to “forget” references to
data, nodes, and threads that might be retained for a long
time by blocked threads (preventing the garbage collector
from reclaiming them).
The simplest approach to time-out involves marking
nodes as “canceled,” and abandoning them for another
thread to eventually unlink and reclaim. If, however, items
are offered at a very high rate, but with a very low time-out
patience, this “abandonment” cleaning strategy can result in
a long-term build-up of canceled nodes, exhausting memory
supplies and degrading performance. It is important to effect
a more sophisticated cleaning strategy. Space limitations
preclude further discussion here, but interested readers may
find more details in the conference version of this paper.
17
For sake of clarity, the synchronous queues of Figures 5
and 6 blocked with busy-wait spinning to await a counterpart
consumer. In practice, however, busy-wait is useless overhead on a uniprocessor and can be of limited value on even
a small-scale multiprocessor. Alternatives include desched-uling a thread until it is signaled, or yielding the processor
within a spin loop. 9 In practice, we mainly choose the spin-then-yield approach, using the park and unpark methods contained in java.util.concurrent.locks.LockSupport12 to
remove threads from and restore threads to the ready list.
On multiprocessors (only), nodes next in line for fulfillment
spin briefly (about one-quarter the time of a typical context
switch) before parking. On very busy synchronous queues,
spinning can dramatically improve throughput because it
handles the case of a near-simultaneous “flyby” between a
producer and consumer without stalling either. On less busy