Analysis results that are shown at compilation time must reach a much higher bar for quality and accuracy that is
not possible to meet for some analyses
that can still identify serious faults. After the review and code are checked in,
the friction confronting developers for
making changes increases. Developers
are thus hesitant to make additional
changes to code that has already been
tested and released, and lower severity
and less-important issues are unlikely
to be addressed. Other analysis projects among major software-development organizations (such as Facebook
Infer analysis for Android/iOS apps7)
have also highlighted code review as a
key point for reporting analysis results.
Expand Analyzer Reach
As Google developer-users have
gained trust in the results from Tricorder analyzers, they continue to
request further analyses. Tricorder
addresses this in two ways: allowing project-level customization and
adding analysis results at additional
points in the developer workflow. In
this section, we also touch on the reasons Google does not yet leverage more
sophisticated analysis techniques as
part of its core developer workflow.
Project-level customization. Not all
requested analyzers are equally valuable throughout the Google codebase;
for example, some analyzers are associated with higher false-positive rates
and so would have correspondingly
high effective false-positive rates or
require specific project configuration
to be useful. These analyzers all have
value but only for the right team.
To satisfy these requests, we aimed
to make Tricorder customizable. Our
previous experience with customization for FindBugs did not end well; us-er-specific customization caused dis-crepancies within and across teams
and resulted in declining use of tools.
Because each user could see a different view of issues, there was no way to
ensure a particular issue was seen by
everyone working on a project. If developers removed all unused imports
from their team’s code, the fix would
quickly backslide if even a single other developer was not consistent about
removing unused imports.
To avoid such problems, Tricorder
allows configuration only at the proj-
all their comments, manual and auto-
mated, have been addressed.
Iterate on feedback from users. In
addition to the “Please fix” button,
Tricorder also provides a “Not useful”
button that reviewers or proposers can
click to express that they do not like the
analysis finding. Clicking automatically files a bug in the issue tracker,
routing it to the team that owns the analyzer. The Tricorder team tracks such
not-useful clicks, computing the ratio
of “Please fix” vs. “Not useful” clicks.
If the ratio for an analyzer goes above
10%, the Tricorder team disables the
analyzer until the author(s) improve
it. While the Tricorder team has rarely
had to permanently disable an analyzer, it has disabled an analyzer (on
several occasions) while the analyzer
author is removing and revising sub-checks that were particularly noisy.
The bugs being filed often lead to
improvement in the analyzers that in
turn greatly improves developers’ satisfaction with those analyzers; for example, the Error Prone team developed, in
2014, an Error Prone check that flagged
when too many arguments were being
passed to a printf-like function in
19 The printf-like function did
not actually accept all printf specifiers, accepting only %s. About once per
week the Error Prone team would receive a “Not useful” bug claiming the
analysis was incorrect because the
number of format specifiers in the
bug filers’ code matched the number
of arguments passed. In every case,
the analysis was correct, and the user
was trying to pass specifiers other
than %s. The team thus changed the
diagnostic text to state directly that
the function accepts only the %s
placeholder and stopped getting bugs
filed about that check.
Scale of Tricorder. As of January 2018,
Tricorder had analyzed approximately
50,000 code review changes per day.
During peak hours, there were three
analysis runs per second. Reviewers
clicked “Please Fix” more than 5,000
times per day, and authors applied the
automated fixes approximately 3,000
times per day. And Tricorder analyzers
received “Not useful” clicks 250 times
The success of code-review analysis suggests it occupies a “sweet spot”
in the developer workflow at Google.
Even in a mature
full test coverage
and a rigorous
bugs slip by.