a tool wastes developer time with false
positives and low-priority issues, developers will lose faith and ignore results.
Do not just find bugs, fix them. To
sell a static analysis tool, a typical approach is to enumerate a significant
number of issues that are present in
a codebase. The intent is to influence
decision makers by indicating a potential ability to correct the underlying bugs or prevent them in the future.
However, that potential will remain
unrealized if developers are not incen-tivized to act. This is a fundamental
flaw: analysis tools measure their utility by the number of issues they identify, while integration attempts fail
due to the low number of bugs actually fixed or prevented. Instead, Google
static analysis teams take responsibility for fixing, as well as finding, bugs,
and measure success accordingly.
Focusing on fixing bugs has ensured
that tools provide actionable advice30
and minimize false positives. In many
cases, fixing bugs is as easy as finding
them through automated tooling. Even
for difficult-to-fix issues, research over
the past five years has highlighted new
techniques for automatically creating
fixes for static analysis issues.
22, 28, 31
Crowdsource analysis development.
Although typical static analysis tools
require expert developers to write the
analyses, experts may be scarce and
not actually know what checks will
have the greatest impact. Moreover,
analysis experts are typically not domain experts (such as those working
with APIs, languages, and security).
With FindBugs integration, only a
small number of Googlers understood how to write new checks, so
the small BugBot team had to do all
the work themselves. This limited
the velocity of adding new checks
and prevented others from contributing their domain knowledge. Teams
like Tricorder now focus on lowering the bar to developer-contributed
checks, without requiring prior static
analysis experience. For example, the
Google tool Refaster37 allows developers to write checks by specifying
example before and after code snippets. Since contributors are frequently motivated to contribute after debugging faulty code themselves, new
checks are biased toward those that
save developer time.
a scenario in which analysis results
for a particular code review would require analyzing the entire repository.
Although Facebook’s Infer7, 25 focuses
on compositional analysis in order to
scale separation-logic-based analysis
to multimillion-line repositories, scaling such analysis to Google’s multibil-lion-line repository would still take significant engineering effort.
As of January 2018, implementing
a system to do more sophisticated
analyses has not been a priority for
Large investment. The up-front infrastructure investment would be prohibitive;
Work needed to reduce false-positive
rates. Analysis teams would have to
develop techniques to dramatically
reduce false-positive rates for many
research analyzers and/or severely restrict which errors are displayed, as
Still more to implement. Analysis
teams still have plenty more “simple” analyzers to implement and integrate; and
High upfront cost. We have found the
utility of such “simple” analyzers to be
high, a core motivation of FindBugs.
In contrast, even determining the cost-benefit ratio for more complicated
checks has a high up-front cost.
Note this cost-benefit analysis may
be very different for developers outside
of Google working in specialized fields
(such as aerospace13 and medical devices21) or on specific projects (such as
device drivers4 and phone apps7).
Our experience attempting to integrate
static analysis into Google’s workflow
taught us valuable lessons:
Finding bugs is easy. When a codebase is large enough, it will contain
practically any imaginable code pattern. Even in a mature codebase with
full test coverage and a rigorous code-review process, bugs slip by. Sometimes the problem is not obvious from
local inspection, and sometimes bugs
are introduced by seemingly harmless
refactorings. For example, consider the
following code snippet hashing a field
f of type long
+ (int) (f (f >>>
Now consider what happens if the
developer changes the type of f to int.
The code continues to compile, but the
right shift by 32 becomes a no-op, the
field is XORed with itself, and the hash
for the field becomes a constant 0. The
result is f no longer affects the value
produced by the hashCode method.
The right shift by more than 31 is statically detectable by any tool able to compute the type of f, yet we fixed 31 occurrences of this bug in Google’s codebase
while enabling the check as a compiler
error in Error Prone.
Since finding bugs is easy,
uses simple tooling to detect bug patterns. Analysis writers then tune the
checks based on results from running
over Google code.
Most developers will not go out of their
way to use static analysis tools. Following
in the footsteps of many commercial
tools, Google’s initial implementation of
FindBugs relied on engineers choosing
to visit a central dashboard to see the issues found in their projects, though few
of them actually made such a visit. Finding bugs in checked-in code (that may already be deployed and running without
user-visible problems) is too late. To ensure that most or all engineers see static-analysis warnings, analysis tools must
be integrated into the workflow and enabled by default for everyone. Instead of
providing bug dashboards, projects like
Error Prone extend the compiler with
additional checks, and surface analysis
results in code review.
Developer happiness is key. In our
experience and in the literature, many
attempts to integrate static analysis
into a software-development organization fail. At Google, there is typically
no mandate from management that
engineers use static analysis tools.
Engineers working on static analysis
must demonstrate impact through
hard data. For a static analysis project
to succeed, developers must feel they
benefit from and enjoy using it.
To build a successful analysis platform, we have built tools that deliver
high value for developers. The Tricorder team keeps careful accounting
of issues fixed, performs surveys to understand developer sentiment, makes
it easy to file bugs against the analysis tools, and uses all this data to justify continued investment. Developers
need to build trust in analysis tools. If