is described in ReEnact16; Figure 4 includes examples with a lock, flag, and
barrier.
Each chunk is given a counter
value called ChunkID following the
happens-before ordering. Specifically, chunks in a given thread receive
ChunkIDs that increase in program
order. Moreover, a synchronization between two threads orders the
ChunkIDs of the chunks involved in
the synchronization. For example, in
Figure 4a, the chunk in Thread
2 following the lock acquire (Chunk
5)
sets its ChunkID to be a successor of
both the previous chunk in Thread
2
(Chunk
4) and the chunk in Thread
1
that released the lock (Chunk
2). For
the other synchronization primitives,
the algorithm is similar. For example, for the barrier in Figure 4c, each
chunk immediately following the barrier is given a ChunkID that makes it a
successor of all the chunks leading to
the barrier.
Using ChunkIDs, we’ve given a
partial ordering to the chunks. For
example, in Figure 4a, Chunks
1 and
6 are ordered, but Chunks
3 and
4 are
not. Such ordering helps detect data
races that occur in a particular execution. Specifically, when two chunks
from different threads are found to
have a data-dependence at runtime,
their two ChunkIDs are compared. If
the ChunkIDs are ordered, this is not
a data race because there is an intervening synchronization between the
chunks. Otherwise, a data race has
been found.
A simple way to determine when
two chunks have a data-dependence
is to use the Bulk Multicore signatures to tell when the data footprints
of two chunks overlap. This operation, together with the comparison
and maintenance of ChunkIDs, can
be done with low overhead with hardware support. Consequently, the Bulk
Multicore can detect data races without significantly slowing the program,
making it ideal for debugging production runs.
Enhancing programmability by making signatures visible to software. Finally, a technique that improves programmability further is to make additional
signatures visible to the software. This
support enables inexpensive monitoring of memory accesses, as well as
We propose that the software interact with some additional signatures through three
main primitives:
18
the first is to explicitly encode into a signature either one address (Figure 1a) or all
addresses accessed in a code region (Figure 1b). the latter is enabled by the bcollect
(begin collect) and ecollect (end collect) instructions, which can be set to collect only
reads, only writes, or both.
the second primitive is to disambiguate the addresses accessed by the processor
in a code region against a given signature. it is enabled by the bdisamb.loc (begin
disambiguate local) and edisamb.loc (end disambiguate local) instructions (Figure 1c),
and can disambiguate reads, writes, or both.
the third primitive is to disambiguate the addresses of incoming coherence
messages (invalidations or downgrades) against a given local signature. it is enabled
by the
bdisamb.rem (begin disambiguate remote) and
edisamb.rem (end disambiguate
remote) instructions (Figure 1d) and can disambiguate reads, writes, or both. When
disambiguation finds a match, the system can deliver an interrupt or set a bit.
Figure 2 includes three examples of what can be done with these primitives: Figure
2a shows how the machine inexpensively supports many watchpoints. the processor
encodes into signature Sig2 the address of variable y and all the addresses accessed in
function foo(). it then watches all these addresses by executing bdisamb.loc on Sig2.
Figure 2b shows how a second call to a function that reads and writes memory in
its body can be skipped. in the figure, the code calls function foo() twice with the same
input value of x. to see if the second call can be skipped, the program first collects
all addresses accessed by foo() in Sig2. it then disambiguates all subsequent accesses
against Sig2. When execution reaches the second call to foo(), it can skip the call if two
conditions hold: the first is that the disambiguation did not find a conflict; the second
(not shown in the figure) is that the read and write footprints of the first foo() call do not
overlap. this possible overlap is checked by separately collecting the addresses read
in foo() and those written in foo() in separate signatures and intersecting the resulting
signatures.
Finally, Figure 2c shows a way to detect data dependences between threads running
on different processors. in the figure, collect encodes all addresses accessed in a
code section into Sig2. Surrounding the collect instructions, the code places disamb.
rem instructions to monitor if any remotely initiated coherence-action conflicts with
addresses accessed locally. to disregard read-read conflicts, the programmer can
collect the reads in a separate signature and perform remote disambiguation of only
writes against that signature.
Making Signatures
Visible to Software
Figure 1. Primitives enabling software to interact with additional signatures:
collection (a and b), local disambiguation (c), and remote disambiguation (d).
Encode Addr, Sig1
(a)
x = ...
... = y
bcollect Sig1
ecollect Sig1
(b)
edisamb.loc Sig1
... = y
x = ...
bdisamb.loc Sig1
(c)
edisamb.rem Sig1
... = y
x = ...
bdisamb.rem Sig1
(d)
Figure 2. using signatures to support data watchpoints (a), skip execution of
functions (b), and detect data dependencies between threads running on
different processors (c).
.
.
bdisamb.loc Sig2
ecollect Sig2
foo()
bcollect Sig2
encode &y, Sig2
(a)
foo(x)
... = y
z = ...
foo(x)
}
foo(x)
if (conflict) {
edisamb.loc Sig2
z = ...
... = y
bdisamb.loc Sig2
ecollect Sig2
foo(x)
bcollect Sig2
(b)
edisamb.rem Sig2
ecollect Sig2
... = y
z = ...
bcollect Sig2
bdisamb.rem Sig2