random write not landing on a canary across all k heaps is
thus at most ( 1 − (M − 1)/2M)k. The overflow string could also
match the canary value. Since the canary is randomly chosen,
the odds of this are at most (1/256)b. □
4. 2. culprit identification
At this point, Exterminator has identified the possible victims
of overflows. For each victim, it scans the heap images for a
matching culprit, the object that is likely to be the source of
the overflow into a victim. Because Exterminator assumes
that overflows are deterministic when operating in iterative
or replicated mode, the culprit must be the same distance d
bytes away from the victim in every heap image. In addition,
Exterminator requires that the overflowed values have some
bytes in common across the images, and ranks them by their
similarity. Note that, while Exterminator only considers positive values of d, these values may be arbitrarily large.
Exterminator checks every other heap image for the candidate culprit, and examines the object that is the same d bytes
forward. If that object is free and should be filled with canaries but they are not intact, then it adds this culprit–victim
pair to the candidate list.
We now bound the false-positive rate. Because buffer
overflows can be discontiguous, every object in the heap that
precedes an overflow is a potential culprit. However, each
additional heap dramatically lowers this number.
Theorem 3. The expected number of objects (possible culprits)
the same distance d from any given victim object across k heaps is
1
E( possible culprits) = .
( h −
1)k−
2
Proof. Without loss of generality, assume that the victim
object occupies the last slot in every heap. An object can thus
be in any of the remaining n = h − 1 slots. The odds of it being
in the same slot in k heaps is p = 1/ (h − 1)k− 1. This is a binomial distribution, so E(possible culprits) = np= 1/(h − 1)k− 2. □
With only one heap image, all (h − 1) objects are potential culprits, but one additional image reduces the expected
number of culprits for any victim to just 1 (1/(h − 1)0), effectively eliminating the risk of false positives.
Once Exterminator identifies a culprit–victim pair, it
records the overflow size for that culprit as the maximum of
any observed d to a victim. Exterminator also assigns each culprit–victim pair a score that corresponds to its confidence that
it is an actual overflow. This score is 1 − (1/256)S, where S is the
sum of the length of detected overflow strings across all pairs.
Intuitively, small overflow strings (e.g., 1 byte) detected in only
a few heap images are given lower scores, and large overflow
strings present in many heap images get higher scores.
After overflow processing completes and at least one culprit has a nonzero score, Exterminator generates a runtime
patch for an overflow from the most highly ranked culprit.
4. 3. Dangling pointer isolation
Isolating dangling pointer errors falls into two cases: a program may read and write to the dangled object, leaving it partially or completely overwritten, or it may only read through
92 communications of the acm | dEcEmbEr 2008 | voL. 51 | No. 12
the dangling pointer. Exterminator does not handle read-only dangling pointer errors in iterative or replicated mode
because it would require too many replicas (e.g., around 20;
see Section 7. 2). However, it handles overwritten dangling
objects straightforwardly.
When a freed object is overwritten with identical values
across multiple heap images, Exterminator classifies the
error as a dangling pointer overwrite. (As Theorem 1 shows,
this situation is highly unlikely to occur for a buffer overflow.)
Exterminator then generates an appropriate runtime patch,
as Section 6. 2 describes.
5. cumuLative eRRoR isoLation
Unlike iterative and replicated mode, cumulative mode
focuses on detecting, isolating, and correcting errors that
happen in the field. In this context, replication, identical
inputs, and deterministic execution are infeasible. Worse,
program errors may manifest themselves in ways that are
inherently hard to detect. For example, a program that reads
a canary written into a free object may fail immediately, or
may execute incorrectly for some time.
Our approach to error detection in this mode is to consider
exceptional program events, such as premature termination,
raising unexpected signals, etc., to be evidence that memory
was corrupted during execution. We counter the lack of error
reproducibility in these cases with statistical accumulation
of evidence before assuming an error needs to be corrected.
Exterminator isolates memory errors in cumulative mode by
computing summary information accumulated over multiple executions, rather than by operating over multiple heap
images.
5. 1. Buffer overflow detection
Exterminator’s buffer overflow isolation algorithm proceeds
in three phases. First, it identifies heap corruption by looking for overwritten canary values. Second, for each allocation
site, it computes an estimate of the probability that an object
from that site could be the source of the corruption. Third, it
combines these independent estimates from multiple runs
to identify sites that consistently appear as candidates for
causing the corruption.
Exterminator’s randomized allocator allows us to compute the probability of certain properties in the heap. For
example, the probability of an object occurring on a given
miniheap can be estimated given the miniheap size and the
number of miniheaps. If objects from some allocation site
are sources of overflows, then those objects will occur on
miniheaps containing corruptions more often than expected.
Exterminator tracks how often objects from each allocation
site occur on corrupted miniheaps across multiple runs.
Using this information, it uses a statistical hypothesis test
that identifies sites that occur with corruption too often to be
random chance, and identifies them as overflow culprits (see
Novark11 for more details).
Once Exterminator identifies an erroneous allocation
site, it produces a runtime patch that corrects the error. To
find the correct pad size, it searches backward from the corruption found during the current run until it finds an object