in the morning, and the results meeting
would follow in the afternoon; as code
size at trials grows it’s not uncommon
to split them across two (or more) days.
Sending people to a trial dramatically raises the incremental cost of each
sale. However, it gives the non-trivial
benefit of letting us educate customers
(so they do not label serious, true bugs
“coDEProFiLES” by W. braDFor D PaLEy
as false positives) and do real-time, ad
hoc workarounds of weird customer
system setups.
The trial structure is a harsh test for
any tool, and there is little time. The
checked system is large (millions of
lines of code, with 20–30MLOC a possibility). The code and its build system
are both difficult to understand. How-
ever, the tool must routinely go from
never seeing the system previously to
getting good bugs in a few hours. Since
we present results almost immediately
after the checking run, the bugs must
be good with few false positives; there
is no time to cherry pick them.
Furthermore, the error messages
must be clear enough that the sales engineer (who didn’t build the checked
system or the tool) can diagnose and
explain them in real time in response
to “What about this one?” questions.
The most common usage model for
the product has companies run it as
part of their nightly build. Thus, most
require that checking runs complete in
12 hours, though those with larger code
bases ( 10+MLOC) grudgingly accept
24 hours. A tool that cannot analyze
at least 1,400 lines of code per minute
makes it difficult to meet these targets.
During a checking run, error messages
are put in a database for subsequent
triaging, where users label them as
true errors or false positives. We spend
significant effort designing the system
so these labels are automatically reapplied if the error message they refer to
comes up on subsequent runs, despite
code-dilating edits or analysis-chang-ing bug-fixes to checkers.
As of this writing (December 2009),
approximately 700 customers have
licensed the Coverity Static Analysis
product, with somewhat more than a
billion lines of code among them. We
estimate that since its creation the tool
has analyzed several billion lines of
code, some more difficult than others.
Caveats. Drawing lessons from a single data point has obvious problems.
Our product’s requirements roughly
form a “least common denominator”
set needed by any tool that uses non-trivial analysis to check large amounts
of code across many organizations; the
tool must find and parse the code, and
users must be able to understand error messages. Further, there are many
ways to handle the problems we have
encountered, and our way may not be
the best one. We discuss our methods
more for specificity than as a claim of
solution.
Finally, while we have had success
as a static-tools company, these are
small steps. We are tiny compared to
mature technology companies. Here,
too, we have tried to limit our discus-
sion to conditions likely to be true in a
larger setting.
Laws of Bug finding
The fundamental law of bug finding
is No Check = No Bug. If the tool can’t
check a system, file, code path, or given
property, then it won’t find bugs in it.
Assuming a reasonable tool, the first
order bound on bug counts is just how
much code can be shoved through the
tool. Ten times more code is 10 times
more bugs.
We imagined this law was as simple
a statement of fact as we needed. Unfortunately, two seemingly vacuous corollaries place harsh first-order bounds
on bug counts:
Law: You can’t check code you don’t
see. It seems too trite to note that checking code requires first finding it... until
you try to do so consistently on many
large code bases. Probably the most reliable way to check a system is to grab its
code during the build process; the build
system knows exactly which files are included in the system and how to compile them. This seems like a simple task.
Unfortunately, it’s often difficult to understand what an ad hoc, homegrown
build system is doing well enough to extract this information, a difficulty compounded by the near-universal absolute
edict: “No, you can’t touch that.” By default, companies refuse to let an external force modify anything; you cannot
modify their compiler path, their broken makefiles (if they have any), or in any
way write or reconfigure anything other
than your own temporary files. Which is
fine, since if you need to modify it, you
most likely won’t understand it.
Further, for isolation, companies
often insist on setting up a test machine for you to use. As a result, not
infrequently the build you are given to
check does not work in the first place,
which you would get blamed for if you
had touched anything.
Our approach in the initial months
of commercialization in 2002 was a
low-tech, read-only replay of the build
commands: run make, record its output in a file, and rewrite the invocations to their compiler (such as gcc)
to instead call our checking tool, then
rerun everything. Easy and simple.
This approach worked perfectly in the
lab and for a small number of our earliest customers. We then had the fol-