freed objects, all overwritten with canary values. All of these
objects would then be potential sources of dangling pointer
errors.
In cumulative mode, Exterminator prevents this scenario
by making a random choice every time an object is freed;
rather than always filling the freed object with canaries and
setting the associated canary bit, it performs this filling
and bit-setting action with probability p. This probabilistic approach may seem to degrade Exterminator’s ability to
find errors. However, it is required to isolate read-only dangling pointer errors, where the canary itself remains intact.
Because it would take an impractically large number of iterations or replicas to isolate these errors, Exterminator always
fills freed objects with canaries when not running in cumulative mode (see Sections 5. 2 and 7. 2 for discussion).
3. 3. 4. Probabilistic Error Detection
Whenever DieFast allocates memory, it examines the memory to be returned to verify that any canaries it is supposed to
contain (as indicated by the canary bitset) are intact. If not, in
addition to signaling an error (see Section 3. 4), DieFast sets
the allocated bit for this chunk of memory. This “bad object
isolation” ensures that the object will not be reused for
future allocations, preserving its contents for Exterminator’s
subsequent use. By checking canary integrity on each allocation, DieHard can be expected to detect heap corruption
within approximately h allocations, where h is the number
of objects on the heap.
After every deallocation, DieFast checks both the preceding
and subsequent objects. For each of these, DieFast checks if
they are free. If so, it performs the same canary check as above.
Recall that because DieFast’s allocation is random, the identity
of these adjacent objects will differ from run to run. Checking
both the subsequent and the preceding objects on each free
allows DieFast to perform an inexpensive check for any nearby
out-of-bound writes, including “strided” object writes (e.g.,
a [i + 32]) that might jump over a subsequent object.
3. 4. modes of operation
Exterminator can be used in three modes of operation: an
iterative mode suitable for testing or whenever all program
inputs can be made available for repeated execution, a replicated mode that is suitable both for testing and for restricted
deployment scenarios, and a cumulative mode that is suitable for broad deployment. All of these rely on the generation of heap images, which Exterminator examines to isolate
errors and compute runtime patches.
If Exterminator discovers an error when executing a program, or if DieFast signals an error, Exterminator forces the
process to emit a heap image file. This file is akin to a core
dump, but contains less data (e.g., no code) and is organized
to simplify processing. In addition to the full heap contents
and heap metadata, the heap image includes the current
allocation time (i.e., the number of allocations to date).
3. 4. 1. Iterative Mode
Exterminator’s iterative mode operates without replication.
To find a single bug, Exterminator is initially invoked via a
command-line option that directs it to stop as soon as it
90 communications of the acm | dEcEmbEr 2008 | voL. 51 | No. 12
detects an error. Exterminator then re-executes the program
in “replay” mode over the same input (but with a new random seed). In this mode, Exterminator reads the allocation
time from the initial heap image to abort execution at that
point; we call this a malloc breakpoint. Exterminator then
begins execution and ignores DieFast error signals that are
raised before the malloc breakpoint is reached.
Once it reaches the malloc breakpoint, Exterminator triggers another heap image dump. This process can be repeated
multiple times to generate independent heap images.
Exterminator then performs postmortem error isolation and
runtime patch generation. A small number of iterations usually suffices for Exterminator to generate runtime patches
for an individual error, as we show in Section 7. 2. When run
with a correcting memory allocator that incorporates these
changes (described in detail in Section 6. 3), these patches
automatically fix the isolated errors.
3. 4. 2. Replicated Mode
The iterated mode described above works well when all
inputs are available so that rerunning an execution is feasible.
However, when applications are deployed in the field, such
inputs may not be available, and replaying may be impractical. The replicated mode of operation allows Exterminator
to correct errors while the program is running, without the
need for multiple iterations.
As Figure 3 shows, Exterminator (like DieHard) can run a
number of differently randomized replicas simultaneously
(as separate processes), broadcasting inputs to all and voting
on their outputs. However, Exterminator uses DieFast-based
heaps, each with a correcting allocator. This organization
lets Exterminator discover and fix errors.
In replicated mode, when DieFast signals an error or
the voter detects divergent output, Exterminator sends a
signal that triggers a heap image dump for each replica.
Exterminator also dumps heap images if any replica crashes
because of a segmentation fault.
If DieFast signals an error, the replicas that dump a
heap image do not have to stop executing. If their output
figure 3: exterminator’s replicated architecture (section 3. 4).
Replicas are equipped with different seeds that fully randomize their
Diefast-based heaps (section 3. 3), input is broadcast to all replicas,
and output goes to a voter. a crash, output divergence, or signal
from Diefast triggers the error isolator (section 4), which generates
runtime patches. these patches are fed to correcting allocators
(section 6), which fix the bug for current and subsequent executions.
Error isolator
Runtime
patches
Input
correcting allocator
seed DieFast replica1
correcting allocator
seed DieFast replica2
Output
Broadcast
Vote
correcting allocator