of asserts to guard against illegal conditions, but fuzz testing relies on them to
catch application logic errors. It wreaks
havoc in your codebase and assumes you
have instrumented your application’s
components such that havoc is noticed.
For example, when acting as a secondary in a MongoDB replica set, mon-god has an assertion to halt if it fails to
write an operation. 9 If a primary node
logs a write for its secondaries, they
had better be able to perform the write
as well, or we will wind up with serious
data loss when failovers happen. Since
these assertions are fatal errors, the
testing framework immediately notices
when fuzz tests trigger them.
The limitation of randomized testing.
This is really the only way that assertions
can be used to catch errors provoked by
randomly generated tests. Assertions in
the target program can be oblivious to
the tests being run; indeed, they must
hold true under all circumstances (
including when the program is being run
by a user). In contrast, assertions within
tests must be specific to the test scenario. We have already shown, however, that
fuzzer-generated tests, by their nature,
must not include fatal assertions. So
under truly random conditions, a fuzzer
will trigger no tailored assertions. This is
a limitation of all randomized testing
techniques, and it is why any good testing framework must not rely solely on
randomized testing.
Triaging a Fuzzer Failure
Tests that perform random code execution and rely on target system assertions
have some downsides: the problems
they find have no predefined purpose;
many of the operations within them
might be innocuous noise; and the errors they produce are often convoluted.
Failures observed at a particular line of
the test might rely on a state set up by
previous operations, so parts of the codebase that may be unrelated have to be
examined and understood.
Thus, fuzzer failures require triage to
find the smallest set of operations that
trigger the problem. This can take sig-
nificant human intervention, as with the
known issue17 where calling cursor.
explain() 6 with concurrent clients
causes a segmentation fault. The test
that provoked this issue used a dozen
clients performing different operations
concurrently, so beside understanding
tual test ran, it turned out that the new
value would still be a large number, but
not the correct one. The bug was that
MongoDB stored the integer as a dou-
ble internally, which has only 53 bits of
precision. 13 The fuzzer was able to find
this by generating the large Number-
Long, which did not appear in any test.
The combination of mutational fuzzing with the edge cases we seed to the
generational fuzzer is an order of magnitude more powerful than writing tests
for these edge cases explicitly. In fact, a
significant portion of the bugs the fuzzer found were triggered by values generated in this way.
An Unbridled Fuzzer
Creates Too Much Noise
Ultimately, fuzz testing is a game of
random numbers. Random numbers
make the fuzzer powerful but can
cause unforeseen problems. We needed to take some steps to ensure the
fuzzer does not blow itself up. Take the
following block of code, which resembles something that would be present
in one of MongoDB’s JavaScript tests:
while( coll.count() < 654321)
assert(coll.update({a: 1},
{$set: {...}}))
This code does a large number of
updates to a document stored in MongoDB. If we were to put it through the
mutational and generational fuzzing
steps described previously, the fuzzer
could produce this possible test case:
while(true) assert(coll.update({}, {$set: {"a.654321" : 1}}))
The new code now tests something
completely different. It tries to set the
654321st element in an array stored in all
documents in some MongoDB collection.
This is an interesting test case. Using
the $set operator with such a large ar-
ray may not be something we thought of
testing explicitly and could trigger a bug
(in fact, it does). 12 But the interaction be-
tween the fuzzed true condition and the
residual while loop is going to hang the
test!—unless, that is, the assert call in
the while loop fails, which could happen
if the line defining coll in the original
test (not shown here) is mutated or de-
leted by the fuzzer, leaving coll unde-
fined. If the assert call failed, it would be
caught by the Mongo shell and cause it
to terminate.
Neither the hang nor the assertion
failure, however, are caused by bugs in
MongoDB. They are just by-products
of a randomly generated test case, and
they represent two classes of noise that
must be filtered out of fuzz testing:
branch logic and assertion failures.
Branch logic. To guard against ac-
cidental hangs, our fuzzer simply takes
out all branching logic via AST manipu-
lation. In addition to while loops, we re-
move try/catch, break, and contin-
ue statements, do/while, for, for/in,
and for/of loops. These language struc-
tures are defined in a static list.
Assertion failures. For the assertion
failures, every single line of generated
test code is wrapped with a try/catch
statement. All the logic will still be ex-
ecuted, but no client-side errors will
propagate up and cause a failure.
After passing through this sanitizing phase, our earlier example now
looks like this:
try {
assert(coll.update({},
{$set: {"a.654321" : 1}}))
} catch {}
So How Does the Fuzzer Catch Bugs?
Wrapping everything in a try/catch
block keeps fuzzer-generated noise
from overwhelming us with false positives, but it also prevents any bugs from
surfacing through the client-side assertions our typical tests rely on. Indeed, a
fuzzer has to rely on other mechanisms
to detect the errors it provokes.
Tools for generic errors. The first set
of tools are ones we are using anyway,
for finding segmentation faults, memory
leaks, and undefined behavior. Even without a fuzz tester, we would still be using
these language runtime tools, 3 such as
LLVM’s address sanitizer4 and undefined
behavior sanitizer, 5 but they become far
more useful when a fuzzer is bombarding
the test target with all its random input.
These tools are good for generic
coding errors, but they don’t validate
that a program is behaving as expected by end users. To catch issues with
business logic, our fuzzer relies on assertions within the testing target that
check for conditions it should not be in.
Assertions within the system under
test. Many applications make liberal use