system did. On the other hand, the
commercial Coverity product, despite
its improvements, lags behind the research system in some ways because it
had to drop checkers or techniques that
demand too much sophistication on
the part of the user. As an example, for
many years we gave up on checkers that
flagged concurrency errors; while finding such errors was not too difficult, explaining them to many users was. (The
PREfix system also avoided reporting
races for similar reasons though is now
supported by Coverity.)
No bug is too foolish to check for.
Given enough code, developers will write
almost anything you can think of. Further, completely foolish errors can be
some of the most serious; it’s difficult to
be extravagantly nonsensical in a harmless way. We’ve found many errors over
the years. One of the absolute best was
the following in the X Window System:
if(getuid() != 0 && geteuid == 0) {
ErrorF(“only root”);
exit( 1);
}
It allowed any local user to get root
accessd and generated enormous press
coverage, including a mention on Fox
news (the Web site). The checker was
written by Scott McPeak as a quick hack
to get himself familiar with the system. It
made it into the product not because of
a perceived need but because there was
no reason not to put it in. Fortunately.
false Positives
False positives do matter. In our experience, more than 30% easily cause problems. People ignore the tool. True bugs
get lost in the false. A vicious cycle starts
where low trust causes complex bugs
to be labeled false positives, leading to
yet lower trust. We have seen this cycle
triggered even for true errors. If people
don’t understand an error, they label it
false. And done once, induction makes
the (n+ 1)th time easier. We initially
thought false positives could be eliminated through technology. Because of
this dynamic we no longer think so.
We’ve spent considerable technical
d The tautological check geteuid == 0 was intended to be geteuid() == 0. In its current
form, it compares the address of geteuid to 0; given that the function exists, its address is never 0.
effort to achieve low false-positive rates
in our static analysis product. We aim
for below 20% for “stable” checkers.
When forced to choose between more
bugs or fewer false positives we typically choose the latter.
Talking about “false positive rate” is
simplistic since false positives are not
all equal. The initial reports matter inordinately; if the first N reports are false
positives (N = 3?), people tend to utter
variants on “This tool sucks.” Furthermore, you never want an embarrassing false positive. A stupid false positive implies the tool is stupid. (“It’s not
even smart enough to figure that out?”)
This technical mistake can cause social problems. An expensive tool needs
someone with power within a company
or organization to champion it. Such
people often have at least one enemy.
You don’t want to provide ammunition
that would embarrass the tool champion internally; a false positive that fits in
a punchline is really bad.
conclusion
While we’ve focused on some of the
less-pleasant experiences in the commercialization of bug-finding products, two positive experiences trump
them all. First, selling a static tool has
become dramatically easier in recent
years. There has been a seismic shift in
terms of the average programmer “
getting it.” When you say you have a static
bug-finding tool, the response is no longer “Huh?” or “Lint? Yuck.” This shift
seems due to static bug finders being in
wider use, giving rise to nice networking effects. The person you talk to likely
knows someone using such a tool, has a
competitor that uses it, or has been in a
company that used it.
Moreover, while seemingly vacuous
tautologies have had a negative effect
on technical development, a nice balancing empirical tautology holds that
bug finding is worthwhile for anyone
with an effective tool. If you can find
code, and the checked system is big
enough, and you can compile (enough
of) it, then you will always find serious
errors. This appears to be a law. We encourage readers to exploit it.
Acknowledgments
We thank Paul Twohey, Cristian Cadar,
and especially Philip Guo for their help-
ful, last-minute proofreading. The ex-
perience covered here was the work of
many. We thank all who helped build the
tool and company to its current state,
especially the sales engineers, support
engineers, and services engineers who
took the product into complex environ-
ments and were often the first to bear
the brunt of problems. Without them
there would be no company to docu-
ment. We especially thank all the cus-
tomers who tolerated the tool during
its transition from research quality to
production quality and the numerous
champions whose insightful feedback
helped us focus on what mattered.
References
1. ball, t. and rajamani, S.k. automatically validating
temporal safety properties of interfaces. in
Proceedings of the Eighth international SPIN
Workshop on Model Checking of Software (toronto,
ontario, canada). M. Dwyer, Ed. Springer-Verlag, New
york, 2001, 103–122.
2. bush, W., Pincus, J., and Sielaff, D. a static analyzer
for finding dynamic programming errors. Software:
Practice and Experience 30, 7 (June 2000), 775–802.
3. coverity static analysis; http://www.coverity.com
4. Das, M., Lerner, S., and Seigle, M. ESP: Path-sensitive program verification in polynomial
time. in Proceedings of the ACM SIGPLAN 2002
Conference on Programming Language Design and
Implementation (berlin, germany, June 17–19). acM
Press, New york, 2002, 57–68.
5. Edison Design group. EDg c compiler front-end;
http://www.edg.com
6. Engler, D., chelf, b., chou, a., and hallem, S. checking
system rules using system-specific, programmer-written compiler extensions. in Proceedings of the
Fourth Conference on Operating System Design &
Implementation (San Diego, oct. 22–25). USENiX
association, berkeley, ca, 2000, 1–1.
7. Flanagan, c., Leino, k.M., Lillibridge, M., Nelson, g.,
Saxe, J.b., and Stata, r. Extended static checking
for Java. in Proceedings of the ACM SIGPLAN
Conference on Programming Language Design and
Implementation (berlin, germany, June 17–19). acM
Press, New york, 2002, 234–245.
8. Foster, J. S., terauchi, t., and aiken, a. Flow-sensitive
type qualifiers. in Proceedings of the ACM SIGPLAN
2002 Conference on Programming Language Design
and Implementation (berlin, germany, June 17–19).
acM Press, New york, 2002, 1–12.
9. hallem, S., chelf, b., Xie, y., and Engler, D. a system
and language for building system-specific, static
analyses. in Proceedings of the ACM SIGPLAN
Conference on Programming Language Design and
Implementation (berlin, germany, June 17–19). acM
Press, New york, 2002, 69–82.
10. hastings, r. and Joyce, b. Purify: Fast detection of memory
leaks and access errors. in Proceedings of the Winter
1992 USENIX Conference (berkeley, ca, Jan. 20–24).
USENiX association, berkeley, ca, 1992, 125–138.
11. Xie, y. and aiken, a. context- and path-sensitive
memory leak detection. in Proceedings of the 10th
European Software Engineering Conference Held
Jointly with 13th ACM SIGSOF T International
Symposium on Foundations of Soft ware Engineering
(Lisbon, Portugal, Sept. 5–9). acM Press, New york,
2005, 115–125.
Al Bessey, Ken Block, Ben Chelf, Andy Chou,
Bryan Fulton, Seth hallem, Charles henri-Gros,
Asya Kamsky, and Scott McPeak are current or former
employees of coverity, inc., a software company based
in San Francisco, ca.; http://www.coverity.com
Dawson Engler ( engler@stanford.edu) is an associate
professor in the Department of computer Science and
Electrical Engineering at Stanford University, Stanford, ca,
and technical advisor to coverity, inc., San Francisco, ca.