software components: as long as control flow is never
returned to the caller with locks held, the subs ystem will
remain composable.
Second (and perhaps counterintuitively), one can
achieve concurrency and composability by having no
locks whatsoever. In this case, there must be no global
subsystem state—subsystem state must be captured in
per-instance state, and it must be up to consumers of the
subsystem to assure that they do not access their instance
in parallel. By leaving locking up to the client of the subsystem, the subsystem itself can be used concurrently by
different subsystems and in different contexts. A concrete
example of this is the AVL tree implementation used
extensively in the Solaris kernel. As with any balanced
binary tree, the implementation is sufficiently complex
to merit componentization, but by not having any global
state, the implementation may be used concurrently by
disjoint subsystems—the only constraint is that manipulation of a single AVL tree instance must be serialized.
Don’t use a semaphore where a mutex would suffice. A semaphore is a generic synchronization primitive originally described by Dijkstra that can be used to
effect a wide range of behavior. It may be tempting to use
semaphores in lieu of mutexes to protect critical sections,
but there is an important difference between the two
constructs: unlike a semaphore, a mutex has a notion of
ownership—the lock is either owned or not, and if it is
owned, it has a known owner. By contrast, a semaphore
(and its kin, the condition variable) has no notion of
ownership: when sleeping on a semaphore, one has no
way of knowing which thread one is blocking upon.
The lack of ownership presents several problems when
used to protect critical sections. First, there is no way of
propagating the blocking thread’s scheduling priority
to the thread that is in the critical section. This ability
to propagate scheduling priority—priority inheritance—is
critical in a realtime system, and in the absence of other
protocols, semaphore-based systems will always be
vulnerable to priority inversions. A second problem with
the lack of ownership is that it deprives the system of the
ability to make assertions about itself. For example, when
ownership is tracked, the machinery that implements
thread blocking can detect pathologies such as deadlocks
and recursive lock acquisitions, inducing fatal failure (and
that all-important core dump) upon detection. Finally,
the lack of ownership makes debugging much more
onerous. A common pathology in a multithreaded system
is a lock not being dropped in some errant return path.
When ownership is tracked, one at least has the smoking
gun of the past (faulty) owner—and, thus, clues as to the
code path by which the lock was not correctly dropped.
Without ownership, one is left clueless and reduced to
debugging by staring at code/the ceiling/into space.
All of this is not to say that semaphores shouldn’t be
used (indeed, some problems are uniquely suited to a
semaphore’s semantics), just that they shouldn’t be used
when mutexes would suffice.
Consider memory retiring to implement per-chain
hash-table locks. Hash tables are common data structures
in performance-critical systems software, and sometimes
they must be accessed in parallel. In this case, adding a
lock to each hash chain, with the per-chain lock held
while readers or writers iterate over the chain, seems
straightforward. The problem, however, is resizing the
table: dynamically resizing a hash table is central to its
efficient operation, and the resize means changing the
memory that contains the table. That is, in a resize the
pointer to the hash table must change—but we do not
wish to require hash lookups to acquire a global lock to
determine the current hash table!
This problem has several solutions, but a (relatively)
straightforward one is to retire memory associated with
There is universal
agreement that
writing
multithreaded code
is difficult.
old hash tables instead of freeing it. On a resize, all
per-chain locks are acquired (using a well-defined order
to prevent deadlock), and a new table is then allocated,
with the contents of the old hash table being rehashed
into the new table. After this operation, the old table is
not deallocated but rather placed in a queue of old hash
tables. Hash lookups then require a slight modification to
operate correctly: after acquiring the per-chain lock, the
lookup must check the hash-table pointer and compare
it with the hash-table pointer that was used to determine
the hash chain. If the hash table has changed (that is,
if a hash resize has occurred), it must drop the lock and
repeat the lookup (which will acquire the correct chain
lock in the new table).
There are some delicate issues in implementing
this—the hash-table pointer must be declared volatile,