Doi: 10.1145/1506409.1506431
By William N. Scherer III, Doug Lea, and Michael L. Scott
abstract
In a thread-safe concurrent queue, consumers typically wait for producers to make data available. In a synchronous queue, producers similarly wait for consumers to take the data. We present two new nonblocking, contention-free synchronous queues that achieve high performance through a form of dualism: The underlying data structure may hold both data and, symmetrically, requests.
We present performance results on 16-processor SPARC and 4-processor Opteron machines. We compare our algorithms to commonly used alternatives from the literature and from the Java SE 5.0 class java.util.concurrent .SynchronousQueue both directly in synthetic microbenchmarks and indirectly as the core of Java’s ThreadPoolExecutor mechanism. Our new algorithms consistently outperform the Java SE 5.0 SynchronousQueue by factors of three in unfair mode and 14 in fair mode; this translates to factors of two and ten for the ThreadPoolExecutor. Our synchronous queues have been adopted for inclusion in Java 6.
1. intRoDuction
Mechanisms to transfer data between threads are among the most fundamental building blocks of concurrent systems. Shared memory transfers are typically effected via a concurrent data structure that may be known variously as a buffer, a channel, or a concurrent queue. This structure serves to “pair up” producers and consumers. It can also serve to smooth out fluctuations in their relative rates of progress by buffering unconsumed data. This buffering, in systems that provide it, is naturally asymmetric: A consumer that tries to take data from an empty concurrent queue will wait for a producer to perform a matching put operation; however, a producer need not wait to perform a put unless space has run out. That is, producers can “run ahead” of consumers, but consumers cannot “run ahead” of producers.
A synchronous queue provides the “pairing up” function without the buffering; it is entirely symmetric: Producers and consumers wait for one another, “shake hands,” and leave in pairs. For decades, synchronous queues have played a prominent role in both the theory and practice of concurrent programming. They constitute the central synchronization primitive of Hoare’s CSP8 and of languages derived from it, and are closely related to the rendezvous of Ada. They are also widely used in message-passing software and in stream-style “hand-off” algorithms. 2, Chap. 8 (In this paper we focus on synchronous queues within a multithreaded program, not across address spaces or distributed nodes.)
Unfortunately, design-level tractability of synchronous queues has often come at the price of poor performance.
“Textbook” algorithms for put and take may repeatedly suffer from contention (slowdown due to conflicts
with other threads for access to a cache line) and/or blocking (loops or scheduling operations that wait for activity in another thread). Listing 1, for example, shows one of the most commonly used implementations, due to Hanson. 3 It employs three separate semaphores, each of which is a potential source of contention and (in acquire operations) blocking.a
The synchronization burden of algorithms like Hanson’s is especially significant on modern multicore and multiprocessor machines, where the OS scheduler may take thousands of cycles to block or unblock threads. Even an uncontended semaphore operation usually requires special read-modify-write or memory barrier (fence) instructions, each of which can take tens of cycles.b
Listing 1: hanson’s synchronous queue. semaphore sync indicates whether item is valid (initially, no); send holds 1 minus the number of pending puts; recv holds 0 minus the number of pending takes.
00 public class HansonSQ<E> {
01 E item = null;
02 Semaphore sync = new Semaphore(0);
03 Semaphore send = new Semaphore( 1);
04 Semaphore recv = new Semaphore(0);
05
06
07
08
09
10
11
12
13
14 public void put(E x) {
15 send.acquire();
16 item = x;
17 recv.release();
18 sync.acquire();
19 }
20 }
Public E take() { recv.acquire(); E x = item; sync.release(); send.release(); return x;
}
a
Semaphores are the original mechanism for scheduler-based synchronization (they date from the mid-1960s). Each semaphore contains a counter and a list of waiting threads. An acquire operation decrements the counter and then waits for it to be nonnegative. A release operation increments the counter and unblocks a waiting thread if the result is nonpositive. In effect, a semaphore functions as a non-synchronous concurrent queue in which the transferred data is null. b
Read-modify-write instructions (e.g., compare_and_swap [CAS]) facilitate constructing concurrent algorithms via atomic memory updates. Fences enforce ordering constraints on memory operations.
A previous version of this paper was published in Proceedings of the 11th ACM Symposium on Principles and Practice of Parallel Programming, Mar. 2006.
References:
Archives