When the kernel migrates a space, it first transfers to the
receiving kernel only the space’s register state and address
space metadata. The receiving kernel requests the space’s
memory pages on demand as the space accesses them on the
new node. Each node’s kernel avoids redundant cross-node
page copying in the common case when a space repeatedly
migrates among several nodes—for example, when a space
starts children on each of several nodes, then returns later
to collect their results. For read-only pages such as program
code, each node’s kernel reuses cached copies of these pages
whenever the space returns to that node.
4. hiGh-LeVeL PARALLeL ABstRACtioNs
Determinator’s kernel API eliminates many convenient and
familiar abstractions, but we can reproduce many of them
deterministically in user space, with important trade-offs.
This section describes how Determinator’s user-level runtime infrastructure emulates traditional Unix processes, file
systems, threads, and synchronization.
4. 1. Processes and fork/exec/wait
We make no attempt to replicate Unix process semantics
exactly, but wish to emulate traditional fork/exec/wait
enough to support common uses in scriptable shells,
build tools such as make, and multiprocess applications
such as compilers.
fork: A basic Unix fork() requires only one Put sys-
tem call, to copy the parent’s entire memory state into a
child space, set up the child’s registers, and start the child.
The difficulty arises from Unix’s global process ID (PID)
namespace, a source of nondeterminism as discussed in
Section 2. 4. Since most applications use PIDs returned by
fork() merely as an opaque argument to a subsequent
waitpid(), our runtime makes PIDs local to each process.
Each process allocates its own PIDs, which are unrelated to
and may numerically conflict with PIDs in other processes.
This change breaks Unix applications that pass PIDs among
processes, and means that commands like “ps” must be
built into shells for the same reason that “cd” already is.
This approach works for compute-oriented applications
following the typical fork/wait pattern, however.
exec: A user-level implementation of Unix exec() must
construct the new program’s memory image, intended to
replace the old program, while still executing the old program’s runtime library code. Our runtime loads the new program into a “reserved” child space never used by fork(),
then calls Get to copy that child’s entire memory atop that of
the (running) parent: this Get thus “returns” into the new program. The runtime also carries over some Unix process state,
such as the PID namespace and file system state described
later, from the old to the new program.
Wait: When an application calls waitpid() to wait for a
specific child, the runtime calls Get to synchronize with the
child’s Ret and obtain the child’s exit status. The child may
Ret back to the parent to make I/O requests before terminating; the parent’s runtime services the I/O request and
resumes the waitpid() transparently to the application.
Unix’s wait() is problematic as it waits for any (“the
first”) child to terminate, violating determinism as discussed
in Section 2. 3. The kernel cannot offer a “wait for any child”
system call without compromising determinism, so our run-
time simply waits for the child that was forked earliest.
4. 2. A shared file system
Unix’s file system provides a convenient namespace and
repository for staging program inputs, storing outputs, and
holding intermediate results such as temporary files. Since
our kernel permits no physical state sharing, user-level code
must emulate shared state abstractions. Determinator’s
“shared-nothing” space hierarchy is similar to a distributed system consisting only of uniprocessor machines, so
our user-level runtime borrows distributed file system principles to offer applications a shared file system abstraction.
figure 4. Parallel make under unix and Determinator: (a) and (b) with
unlimited parallelism; (c) and (d) with a 2-worker quota imposed at
Task 2 Task 3
Time all wait()s return
Task 1 CPU 1
Task 2 Task 3
all wait()s return
Task 2 Task 3
“make -j2” on Unix
Task 1 CPU 1
“make -j2” on Determinator