In Djit+, each thread has its own clock that is incremented at each lock release operation. Each thread t also
keeps a VC Ct such that, for any thread u, the clock entry
Ct(u) records the clock for the last operation of thread u
that happens before the current operation of thread t. In
addition, the algorithm maintains a VC Lm for each lock m.
These VCs are updated on synchronization operations that
impose a happens-before order between operations of different threads. For example, when thread u releases lock m,
the Djit+ algorithm updates Lm to be Cu. If a thread t subsequently acquires m, the algorithm updates Ct to be Ct Lm,
since subsequent operations of thread t now happen after
that release operation.
To identify conflicting accesses, the Djit+ algorithm keeps
two VCs, Rx and Wx, for each variable x. For any thread t, Rx(t)
and Wx(t) record the clock of the last read and write to x by
thread t. A read from x by thread u is race-free provided it
happens after the last write of each thread, that is, Wx Cu.
A write to x by thread u is race-free provided that the write
happens after all previous accesses to that variable, that is,
Wx Cu and Rx Cu.
As an example, consider the execution trace fragment
shown in Figure 1, where we include the relevant portion
of the Djit+ instrumentation state: the VCs C0 and C1 for
threads 0 and 1; and the VCs Lm and Wx for the last release
of lock m and the last write to variable x, respectively. We
show two components for each VC, but the target program
may of course contain additional threads.a
At the write wr(0, x), Djit+ updates Wx with current
clock of thread 0. At the release rel(0, m), Lm is updated
with C0. At the acquire acq( 1, m), C1 is joined with Lm, thus
capturing the dashed release-acquire happens-before
figure 1. execution trace under Djit+.
C0
á4,0,…ñ
wr(0,x)
C1
á0, 8,…ñ
Lm
á0,0,…ñ
Wx
á0,0,…ñ
á4,0,…ñ
á0, 8,…ñ
á0,0,…ñ
á4,0,…ñ
á5,0,…ñ
á5,0,…ñ
á5,0,…ñ
á0, 8,…ñ
á4, 8,…ñ
wr( 1,x)
á4, 8,…ñ
á4,0,…ñ
á4,0,…ñ
á4,0,…ñ
á4,0,…ñ
á4,0,…ñ
á4, 8,…ñ
a For clarity, we present a variant of the Djit+ algorithm where some clocks
are one less than in the original formulation. 16 This revised algorithm has
the same performance as the original but is slightly simpler and more directly comparable to Fast Track.
edge shown above. At the second write, Djit+ compares
the VCs:
Wx = á4, 0, ...ñ á4, 8, ...ñ = C1
Since this check passes, the two writes are not concurrent,
and no race condition is reported.
3. the fasttrack aLGoRithm
A limitation of VC-based race detectors such as Djit+ is their
performance, since each VC requires O(n) space and each VC
operation (copying, comparing, joining, etc.) requires O(n)
time.
Empirical benchmark data indicates that reads and
writes operations account for the vast majority (over
96%) of monitored operations. The key insight behind
FastTrack is that the full generality of VCs is not necessary in over 99% of these read and write operations: a
more lightweight representation of the happens-before
information can be used instead. Only a small fraction of
operations performed by the target program necessitate
expensive VC operations.
We begin by providing an overview of how our analysis
catches each type of race condition on a memory location. A
race condition is either: a write–write race condition (where a
write is concurrent with a later write); a write–read race condition (where a write is concurrent with a later read); or a read–
write race condition (where a read is concurrent with a later
write).
Detecting Write–Write races: We first consider how to
efficiently analyze write operations. At the second write
operation in the trace in Figure 1, Djit+ compares the
VCs Wx C1 to determine whether there is a race. A careful inspection reveals, however, that it is not necessary
to record the entire VC á4, 0, …ñ from the first write to x.
Assuming no races have been detected on x so far, then
all writes to x are totally ordered by the happens-before
relation, and the only critical information that needs to
be recorded is the clock ( 4) and identity (thread 0) of the
thread performing the last write. This information (clock
4 of thread 0) is sufficient to determine if a subsequent
write to x is in a race with any preceding write.
We refer to a pair of a clock c and a thread t as an
epoch, denoted c@t. Although rather simple, epochs provide the crucial lightweight representation for recording
sufficiently-precise aspects of the happens-before relation
efficiently. Unlike VCs, an epoch requires only constant
space and supports constant-time operations.
An epoch c@t happens before a VC V (c@t V) if and only
if the clock of the epoch is less than or equal to the corresponding clock in the vector:
c@t V iff c ≤ V(t)
We use e to denote a minimal epoch 0@0.
Using this optimized representation, Fast Track analyzes
the trace from Figure 1 using a more compact instrumentation state that records only a write epoch Wx for variable x,
rather than the entire VC Wx, reducing space overhead, as