Efficient System-Enforced
Deterministic Parallelism
Abstract
Deterministic execution offers many benefits for debugging,
fault tolerance, and security. Current methods of executing
parallel programs deterministically, however, often incur
high costs, allow misbehaved software to defeat repeatability, and transform time-dependent races into input- or path-dependent races without eliminating them. We introduce a
new parallel programming model addressing these issues,
and use Determinator, a proof-of-concept OS, to demonstrate the model’s practicality. Determinator’s microkernel application programming interface (API) provides only
“shared-nothing” address spaces and deterministic interprocess communication primitives to make execution of all
unprivileged code—well-behaved or not—precisely repeatable. Atop this microkernel, Determinator’s user-level runtime offers a private workspace model for both thread-level
and process-level parallel programming. This model avoids
the introduction of read/write data races, and converts write/
write races into reliably detected conflicts. Coarse-grained
parallel benchmarks perform and scale comparably to nondeterministic systems, both on multicore PCs and across
nodes in a distributed cluster.
1. iNtRoDuCtioN
The multicore revolution has made parallel programming
both ubiquitous and unavoidable, but building reliable
parallel software is difficult and error-prone. Mainstream
parallel programming models introduce pervasive risks
for timing-dependent races, leading to nondeterministic “heisenbugs.” Slight synchronization bugs often
cause nondeterministic races between threads accessing shared memory.
2, 20, 21 Concurrent processes can similarly race when passing messages or accessing files in a
shared file system, yielding not just bugs but also security
vulnerabilities.
25
We would often prefer to run software deterministically,
so that from a given input it always produces the same
output.
20 Beyond mere convenience considerations, deterministic execution is also a basic virtualization tool: a foundation required to implement replay debugging,
19 fault
tolerance,
11 and accountability mechanisms.
16 Methods of
intrusion analysis12 and timing channel control4 further
require the system to enforce determinism, even on malicious code that might be designed to evade analysis.
User-space techniques for parallel deterministic
execution8, 22 show promise but have limitations. First,
by relying on a deterministic scheduler residing in the
application process, they permit buggy or malicious
applications to compromise determinism by interfer-
ing with the scheduler. Second, deterministic schedulers
emulate conventional APIs by synthesizing a repeatable—
but arbitrary—schedule of inter-thread interactions, often
using an instruction counter as an artificial time metric.
Therefore, data races remain, but their manifestation
depends subtly on inputs and code path lengths instead
of on “real” time. Third, the user-level instrumentation
required to isolate and schedule threads’ memory accesses
can incur considerable overhead, even on coarse-grained
code that synchronizes rarely.
The original version of this paper was published in the
9th USENIX Symposium on Operating Systems Design and
Implementation, October 4–6, 2010, Vancouver, BC, Canada.