assigned them to the developers we
thought best able to resolve them.
The response was stunning: we were
greeted by near silence. We assigned
20–30 issues to developers, and almost
none of them were acted on. We had
worked hard to get the false positive
rate down to what we thought was less
than 20%, and yet the fix rate—the proportion of reported issues that developers resolved—was near zero.
Next, we switched Infer on at diff
time. The response of engineers was just
as stunning: the fix rate rocketed to over
70%. The same program analysis, with
same false positive rate, had much greater impact when deployed at diff time.
While this situation was surprising
to the static analysis experts on the
Infer team, it came as no surprise to
Facebook’s developers. Explanations
they offered us may be summarized in
the following terms:
One problem that diff-time deploy-
ment addresses is the mental effort of
context switch. If a developer is working
on one problem, and they are confront-
ed with a report on a separate problem,
then they must swap out the mental con-
text of the first problem and swap in the
second, and this can be time consum-
ing and disruptive. By participating as a
bot in code review, the context switch
problem is largely solved: program-
mers come to the review tool to dis-
cuss their code with human reviewers,
with mental context already swapped
in. This also illustrates how important
timeliness is: if a bot were to run for an
hour or more on a diff it could be too
late to participate effectively.
A second problem that diff-time de-
ployment addresses is relevance. When
an issue is discovered in the codebase,
it can be nontrivial to assign it to the
right person. In the extreme, somebody
who has left the company might have
caused the issue. Furthermore, even
if you think you have found someone
familiar with the codebase, the issue
might not be relevant to any of their
past or current work. But, if we com-
ment on a diff that introduces an issue
then there is a pretty good (but not per-
fect) chance that it is relevant.
Mental context switch has been
the subject of psychological studies, 12
and it is, along with the importance
of relevance, part of the received collective wisdom impressed upon us by
Facebook’s engineers. Note that others
have also remarked on the benefits of
reporting during code review. 17
At Facebook, we are working actively
on moving other testing technologies to
diff time when possible. We are also supporting academics on researching incremental fuzzing and symbolic execution
techniques for diff time reporting.
Interprocedural bugs. Many of the
bugs that Infer finds involve reasoning
that spans multiple procedures or files.
An example from OpenSSL illustrates:
apps/ca.c:2780: NULL _ DEREFERENCE
pointer ‘revtm’ last assigned on line
2778 could be null
and is dereferenced at line 2780, column 6
2778. revtm = X509 _ gmtime _ adj(NULL, 0);
2780. i = revtm->length + 1;
The issue is that the procedure
X509 _ gmtime _ adj() can return
null in some circumstances. Overall,
in Facebook’s internal CI system (Fig-
ure 1). Infer does not need to process
the entire codebase in order to analyze
a diff, and so is fast.
An aim has been for Infer to run in
15min–20min on a diff on average,
and this includes time to check out the
source repository, to build the diff, and
to run on base and (possibly) parent
commits. It has typically done so, but
we constantly monitor performance
to detect regressions that makes it
take longer, in which case we work to
bring the running time back down. After running on a diff, Infer then writes
comments to the code review system.
In the default mode used most often
it reports only regressions: new issues
introduced by a diff. The “new” issues
are calculated using a bug equivalence
notion that uses a hash involving the
bug type and location-independent
information about the error message,
and which is sensitive to file moves and
line number changes cause by refactoring, deleting, or adding code; the aim is
to avoid presenting warnings that developers might regard as pre-existing.
Fast reporting is important to keep in
tune with the developers’ workflows.
In contrast, when Infer is run in whole-program mode it can take more than an
hour (depending on the app)—too slow
for diff-time at Facebook.
Human factors. The significance of
the diff-time reasoning of Infer is best
understood by contrast with a failure.
The first deployment was batch rather
than continuous. In this mode Infer
would be run once per night on the
entire Facebook Android codebase,
and it would generate a list of issues.
We manually looked at the issues, and
Figure 2. A simple example capturing a common safety pattern used in Android apps.
Threading information is used to limit the amount of synchronization required. As a comment
from the original code explains: “mCount is written to only by the main thread with the lock held,
read from the main thread with no lock held, or read from any other thread with the lock held.”
Bottom: unsafe additions to Race WithMain Thread .java.