allocated from the site. It then uses the distance between that
object and the end of the corruption as the pad size.
5. 2. Dangling pointer isolation
As with buffer overflows, Exterminator’s dangling pointer
isolator computes summary information over multiple runs.
To force each run to have a different effect, Exterminator fills
freed objects with canaries with some probability p, turning
every execution into a series of Bernoulli trials. In this scenario, if the program reads canary data through the dangling
pointer, the program may crash. Thus writing the canary for
that object increases the probability that the program will
later crash. Conversely, if an object is not freed prematurely,
then overwriting it with canaries has no influence on the failure or success of the program. Exterminator then uses the
same hypothesis testing framework as its buffer overflow
algorithm to identify sources of dangling pointer errors.
The choice of p reflects a trade-off between the precision of the buffer overflow algorithm and dangling pointer
isolation. Since overflow isolation relies on detecting corrupt canaries, low values of p increase the number of runs
(though not the number of failures) required to isolate overflows. However, lower values of p increase the precision of
dangling pointer isolation by reducing the risk that certain
allocation sites (those that allocate large numbers of objects)
will always observe one canary value. We currently set p = 1/2,
though some dangling pointer errors may require lower values of p to converge within a reasonable number of runs.
Exterminator then estimates the required lifetime extension by locating the oldest canaried object from an identified
allocation site, and computing the number of allocations
between the time it was freed and the time that the program
failed. The correcting allocator then extends the lifetime of
all objects corresponding to this allocation/deallocation site
by twice this number.
6. eRRoR coRRection
We now describe how Exterminator uses the information
from its error isolation algorithms to correct specific errors.
Exterminator first generates runtime patches for each error.
It then relies on a correcting allocator that uses this information, padding allocations to prevent overflows, and deferring
deallocations to prevent dangling pointer errors.
Exterminator’s ability to correct memory errors has several inherent limitations. Exterminator can only correct
finite overflows, because it tries to contain any given overflow by finite over-allocation. Similarly, Exterminator corrects dangling pointer errors by inserting finite delays before
freeing particular objects. Finally, Exterminator cannot correct memory errors when the evidence it uses to locate these
errors is destroyed, such as when an overflow overwrites most
of the heap, or when a program with a dangling pointer error
runs long enough to reallocate the dangled object.
6. 1. Buffer overflow correction
For every culprit–victim pair that Exterminator encounters, it
generates a runtime patch consisting of the allocation site hash
and the amount of padding needed to contain the overflow (d +
the size of the overflow string). If a runtime patch has already
been generated for a given allocation site, Exterminator uses
the maximum padding value encountered so far.
6. 2. Dangling pointer correction
The runtime patch for a dangling pointer consists of the combination of its allocation site hash and an amount of time by
which to delay its deallocation. Exterminator computes this
delay as follows. Let t be the recorded deallocation time of
the dangled object, and T be the allocation time at which the
program crashed or Exterminator detected heap corruption.
Exterminator has no way of knowing how long the object is
supposed to live, so computing an exact delay is impossible.
Instead, it extends the object’s lifetime (delays its freeing)
by twice the distance between its premature freeing and the
time of crashing or detection, plus one: 2 × (T − t) + 1.
It is important to note that this deallocation deferral does
not multiply object lifetimes but rather their drag.
15 To illustrate, an object might live for 1000 allocations and then be
freed just 10 allocations too soon. If the program immediately crashes, Exterminator will extend its lifetime by 21 allocations, increasing its correct lifetime (1010 allocations) by
less than 1% (1021/1010).
6. 3. the correcting memory allocator
The correcting memory allocator incorporates the runtime patches described above and applies them when
appropriate.
At start-up, or upon receiving a reload signal (Section 3. 4),
the correcting allocator loads the runtime patches from a
specified file. It builds two hash tables: a pad table mapping
allocation sites to pad sizes, and a deferral table mapping
pairs of allocation and deallocation sites to a deferral value.
Because it can reload the runtime patch file and rebuild these
tables on the fly, Exterminator can apply patches to running
programs without interrupting their execution. This aspect
of Exterminator’s operation may be especially useful for systems that must be kept running continuously.
On every deallocation, the correcting allocator checks to
see if the object to be freed needs to be deferred. If it finds
a deferral value for the object’s allocation and deallocation
site, it pushes onto the deferral priority queue the pointer
and the time to actually free it (the current allocation time
plus the deferral value).
The correcting allocator checks the deferral queue on every
allocation to see if any object should now be freed. It then
checks whether the current allocation site has an associated
pad size. If so, it adds the pad size to the allocation request, and
forwards the allocation request to the underlying allocator.
6. 4. collaborative correction
Each individual user of an application is likely to experience
different errors. To allow an entire user community to automatically improve software reliability, Exterminator provides
a simple utility that supports collaborative correction. This
utility takes as input a number of runtime patch files. It then
combines these patches by computing the maximum pad
size required for any allocation site, and the maximal deferral amount for any given allocation site/deallocation site pair.
The result is a new runtime patch file that covers all observed