volwrite(o, v) to (o, v) is implemented by the xload and xstore
byte-code instructions, respectively. While volread(o, v)
corresponds to a sync-sink, volwrite(o, v) corresponds to
sync-source operation on (o, v). In the JMM, there is a synchronizes-with relationship between each volread(o, v) and
the volwrite(o, v) that it sees.
Fork/Join: Creating a new thread with id t (fork(t)) synchronizes with the first action of thread t, denoted start(t).
The last action of thread t, denoted end(t) synchronizes
with the join operation on t, denoted join(t). For each
thead t, fork(t) and end(t) correspond to sync-source
operations on a (fictitious) synchronization variable t–,
and start(t) and join(t) correspond to sync-sink
operations on t–. The JMM guarantees that for each thread t,
there exists an order →so t– such that: fork(t) →so t– start(t) →so t –
end(t) →so t– join(t).
handling other Synchronization Mechanisms: Using the
lockset update rules above, Goldilocks is able to uniformly handle various approaches to synchronization such
as dynamically changing locksets, permanent or temporary
thread-locality of objects, container-protected objects, ownership transfer of variable without accessing the variable (as
in the example in Section 3. 1). Furthermore, Goldilocks
can also handle the synchronization idioms in the java.
util.concurrent package such as semaphores and barriers, since these primitives are built using locks and volatile
variables. The happens-before edges induced by the wait/
notify(All) construct are computed by simply applying the
Goldilockset update rules to the acquire and release operations invoked inside wait.
3. 4. Race detection and sequential consistency
The Java and C++ memory models provide the data-race freedom (DRF) property. 2, 10 The DRF property guarantees that if
all sequentially consistent executions of a source program
are race free, then the compiled program only exhibits these
sequentially consistent executions of the source program,
after any compiler and hardware optimizations permitted by
the memory model. The Goldilocks algorithm check races
by monitoring the executions of the compiled program, and
assumes that the compiler and the runtime it is built on
(hardware or virtual machine) conform to the language and
the memory model specifications. Therefore, if the source
program is race free, then any execution of the compiled
program corresponds to a sequentially consistent execution of the source program, and no DataRaceException
is thrown.
If the source program has a race, the Goldilocks run-time still ensures that all executions of the compiled program will run under the sequential consistency semantics,
i.e., sequential consistency is guaranteed at the byte-code
level. This is accomplished by preventing accesses that will
cause a data race and throwing a DataRaceException
right before that access. However, in the case of a racy
program, the JMM permits compiler optimizations that
result in executions that are not sequentially consistent
behaviors of the original source code. In this case, the JMM
and the DRF property are not strong enough to allow the
Goldilocks runtime to relate byte-code level executions
90 communications of the acm | noVemBer 2010 | vol. 53 | no. 11
to executions of the source-level program, which makes
debugging hard.
To use Goldilocks for debugging purposes, this difficulty can be remedied by disabling compiler optimizations.
For post-deployment use, a stronger memory model9, 11 that
is able to relate each (racy and race-free) execution of the
compiled program to a sequentially consistent execution of
the source program is needed.
4. imPLementinG GoLDiLocKs
There are two published implementation of the Goldilocks
algorithm, both of which monitor the execution at the Java
byte-code level. At this level, each variable access or synchronization operation corresponds to a single byte-code
instruction, and each byte-code instruction can be associated with a source code line and/or variable.
The first Goldilocks implementation, by the authors of
this paper, was carried out in Kaffe, 19 a clean-room implementation of the Java virtual machine (JVM) in C. In Kaffe,
we integrated Goldilocks into the interpreting mode of
Kaffe’s runtime engine. Implementing the algorithm in the
JVM enables fast access to internal data structures of the
JVM that manage the layout of object in the memory and
using the efficient mechanisms that exist in the JVM internally, such as fast mutex locks.
The second implementation of Goldilocks is by Flanagan
and Freund and was carried out using the RoadRunner dynamic program analysis tool. 8 In RoadRunner, Goldilocks
is implemented in Java and injected by byte-code instrumentation at load-time of the program. This allows the algorithm
to benefit from Java compiler optimizations and just-in-time
compilation and to be portable to any JVM. Flanagan and
Freund showed that this implementation is competitive with
ours in Kaffe for most of the common benchmarks. 7
In the following, we present the most important implementation features and optimizations. The implementation
is described based on the core algorithm presented in Figure
2. The extension of the implementation that distinguishes
read and write accesses can be found in Elmas et al. 6
4. 1. implicit representation and lazy evaluation
of Goldilocksets
For programs with a large number of data variables, representing Goldilocksets explicitly for each data variable and
implementing the Goldilocks algorithm as described in
Figure 2 may have high memory and computational cost.
We avoid the memory cost by representing the Goldilocksets
implicitly and the computational cost by evaluating
Goldilocksets lazily as described below.
Instead of keeping a separate Goldilockset GLS(x) for each
variable x, we represent GLS(x) implicitly as long as no access
to x happens and is computed temporarily when an access
happens. At this point, the Goldilockset is a singleton, and we
continue to represent it implicitly until the next access. For
this, we keep the synchronization events in a single, global
linked list called the synchronization-event list and represent by its head and tail pointers in Figure 4. The ordering of
these events in the list is consistent with the program order
→po t for each thread t and the synchronization orders →so o for