which state the operations in the test set
up, log messages from all the client and
server threads had to be inspected manually and correlated with each other.
All this work is typical of triaging a
fuzzer test failure, so we built a set of features that help developers sift through
the chaos. These are specific to testing a
MongoDB cluster across the network using JavaScript but can be used as inspiration for all fuzzing projects.
We are only interested in the lines of
code that send commands to a MongoDB
server, so the first step is to isolate those.
Using our trusty AST manipulator, we
add a print statement after every line of
fuzzer code to record the time it takes to
run. Lines that take a nontrivial amount
of time to run typically run a command
and communicate with the mongodb
server. With those timers in place, our
fuzz tests look like this:
var $startTime = Date.now();
try {
// a fuzzer generated
line of code
} catch (e) {
}
var $endTime = Date.now();
print('Top-level statement 0
completed in',
$endTime - $startTime,
'ms');
var $startTime = Date.now();
try {
// a fuzzer generated
line of code
} catch (e) {
}
var $endTime = Date.now();
print('Top-level statement 1
completed in',
$endTime - $startTime,
'ms');
// etc.
When we get a failure, we find the
last statement that completed successfully from the log messages, and
the next actual command that runs is
where the triage begins.
This technique would be sufficient
for identifying the trivial bugs that can
cause the server to crash with one or two
lines of test code. More complicated
bugs require programmatic assistance
to find exactly which lines of test code
are causing the problem. We bisect our
way toward that with a breadth-first bi-
nary search over each fuzzer-generated
file. Our script recursively generates
new tests containing each half of the
failed code until any further removal no
longer causes the test to fail.
The binary search script is not a cure-all, though. Some bugs do not reproduce
consistently, or cause hangs, and require
a different set of tools. The particular
tools will depend entirely on your product, but one simple way to identify hangs
is to use a timer. We record the runtime of
a test suite, and if it takes an order of magnitude longer than the average runtime,
we assume it has hung, attach a debugger,
and generate a core dump.
Through the use of timers, print
statements, and binary search script, we
are able to triage the majority of our failures quickly and correctly. There is no
panacea for debugging—every problem
is new, and most require a bit of trial
and error to get right. We are continuously investing in this area to speed up
and simplify failure isolation.
Running the Fuzzer in the CI System
Fuzz testing is traditionally done in dedicated clusters that run periodically on select commits, but we decided to include
it as a test suite in our CI framework,
Evergreen. This saved us the effort of
building out a new automated testing environment and saved us from dedicating
resources to determine in which commit
the bug was introduced.
When a fuzzer is invoked periodically,
finding the offending commit requires
using a tool such as git-bisect. 2 With our
approach of a mutational fuzzer that runs
in a CI framework, we always include
newly committed tests in the corpus.
Every time the fuzzer runs, we pick 150
sets of a few dozen files from the corpus
at random and run each one through
the fuzzer to generate 150 fuzzed files.
Each set of corpus files always includes
new logic added to the codebase, which
means the fuzzed tests are likely testing
new code as well. This is a simple and elegant way for the fuzzer to “understand”
changes to the codebase without the
need for significant work to parse source
files or read code coverage data.
When a fuzz test causes a failure, the
downstream effect is the same as any
other kind of test failure, only with the
extra requirement of triage.
The Fuzzer: Your Best Friend
Overall, the fuzzer has turned out to be
one of the most rewarding tools in the
MongoDB test infrastructure. Building
off our existing suite of JavaScript tests,
we were able to increase our coverage
significantly with relatively little effort.
Getting everything right takes time, but
to get a basic barebones system started,
all you need is a set of existing tests as
the corpus, a syntax-tree parsing for the
language of your choice, and a way to
add the framework to a CI system. The
bottom line is that no matter how much
effort is put into testing a feature, there
will inevitably be that one edge case that
was not handled. In those face-palm
moments, the fuzzer is there for you.
References
1. Acorn; https://github.com/ternjs/acorn.
2. Chacon, S., Straub, B. Git-bisect; https://git-scm.com/
book/en/v2.
3. Clang 3. 8 Documentation. Using Clang as a compiler;
http://releases.llvm.org/3.8.0/tools/clang/docs/index.
html#using-clang-as-a-compiler.
4. Clang 3. 8 Documentation. AddressSanitizer;
http://releases.llvm.org/3.8.0/tools/clang/docs/
AddressSanitizer.html.
5. Clang 3. 8 Documentation. UndefinedBehaviorSanitizer;
http://releases.llvm.org/3.8.0/tools/clang/docs/
UndefinedBehaviorSanitizer.html.
6. Cursor.explain(). MongoDB Documentation; https://docs.
mongodb.com/manual/reference/method/cursor.explain/.
7. Déjà vu Security. Generation fuzzing. Peach
Fuzzer, 2014; http://community.peachfuzzer.com/
GenerationMutationFuzzing.html.
8. Erf, K. Evergreen continuous integration: Why we
reinvented the wheel. MongoDB Engineering Journal
2016; https://engineering.mongodb.com/post/evergreen-
continuous-integration-why-we-reinvented-the-wheel/.
9. GitHub. MongoDB; https://github.com/mongodb/mongo/
blob/f5c9d27ca6f0f4e1e2673c64b84b628ac29493ec/
src/mongo/db/repl/sync_tail.cpp#L1042.
10. Godefroid, P., Levin, M. Y., Molnar, D. SAGE: Whitebox
fuzzing for security testing. Commun. ACM 55, 3
(Mar. 2012, 40-44; http://courses.cs.washington.edu/
courses/cse484/14au/reading/sage-cacm-2012.pdf.
11. Guo, R. Mongos segfault when invoking .explain()
on certain operations. MongoDB, 2016; https://jira.
mongodb.org/browse/SERVER-22767.
12. Guo, R. $push to a large array fasserts on secondaries.
MongoDB, 2016; https://jira.mongodb.org/browse/
SERVER-22635.
13. Kamsky, A. Update considers a change in numerical
type to be a noop. MongoDB, 2016; https://jira.
mongodb.org/browse/SERVER-16801.
14. McCloskey, B., et al. Parser API. Mozilla Developer
Network, 2015; https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Parser_
API#Expressions.
15. Nossum, V., Casasnovas, Q. Filesystem fuzzing with
American Fuzzy Lop. Oracle Linux and VM Development—
Ksplice Team, 2016; https://events.linuxfoundation.org/sites/
events/files/slides/AFL filesystem fuzzing, Vault 2016_0.pdf.
16. Ruderman, J. Introducing jsfunfuzz. Indistinguishable
from Jesse, 2007; https://www.squarefree.
com/2007/08/02/introducing-jsfunfuzz/.
17. Siu, I. Explain(“executionStats”) can attempt to access
a collection after it has been dropped. MongoDB, 2016;
https://jira.mongodb.org/browse/SERVER-24755.
18. Storch, D. MongoDB, jstests. GitHub, 2016; https://
github.com/mongodb/mongo/tree/r3.3.12/jstests.
Robert Guo is a software engineer on the MongoDB
server team, focusing on data consistency and
correctness. He is currently working on MongoDB’s
JavaScript fuzzer.
Copyright held by owner/author.
Publication rights licensed to ACM. $15.00