discover where people were having
problems understanding the codebase and then using those insights
to drive their training programs. We
ended up talking with at least another
dozen teams, and it was interesting
and surprising to learn about the different ways some of those teams had
used our data.
LP: What were some of the bigger
surprises?
CB: The biggest surprise for me was
learning that some teams would use our
tools to identify code reviews that took
too long or contained only a few comments. Then they would open the code
reviews based on that data, and the reviews would tell them what code had
been used and what part of the code
was being reviewed. They would dig
into that and quickly determine, “Oh,
it looks like people are having a tough
time reviewing code that uses this particular API.” That’s how they would determine that their next training session
ought to be devoted to that API.
TC: Have you developed any metrics
for essentially grading the quality of
code reviews?
CB: Not as such, but I know some
teams have built live dashboards
around this data. Some development teams have mounted a massive
TV monitor right on the wall where
metrics like “Time since last bug” or
“Time to delivery of next release” can
be displayed. One team told us they
also put code-review data up on their
scoreboard so people could see how
many code reviews are on backlog or
how much time on average is required
to complete a code review. From what
they told us, it seems that having that
data up on a real-time dashboard,
mission-control style, has proved to be
quite motivating.
Delivering a new set of capabilities for
managing and improving Microsoft’s
code-review process was the primary
goal right from the start. In the course
of accomplishing that, much was also
learned about certain general code-
review principles—guidelines that
might also be applied to beneficial
effect elsewhere. In fact, subsequent
research has offered surprising evi-
dence of just how similar the impact
can be when many of these principles
are followed at companies other than
Microsoft—or, for that matter, by
open source projects.
LP: Looking back to when you first
started this project, what would you
say came up most whenever you questioned people about their primary motives for doing code reviews?
MG: We did a survey where we asked
people to rank their reasons. What
came out of that tended to be fairly
obvious: improving the code, finding
defects, knowledge transfer … that sort
of thing. But then, when we launched
this other study to categorize the comments that had been left in the actual
code, we found they only rarely aligned
with those stated motivations.
LP: Interesting. What did those comments chiefly focus on?
MG: There were a lot of comments
about the documentation, of course.
And you would see some remarks having to do with alternative solutions.
There also were comments about validation, which admittedly leaned in the
direction of bug resolution since people would say, “You know, if this particular corner case went away, you would
be able to eliminate some of these
problems.” People also had things to
say about API usage—and best practices as well. On the whole, I’d say these
sorts of comments far outweighed any
that focused on specific defects.
JC: To Michaela’s point regarding
this mismatch between expectations
and reality, despite the fact that people consistently said their primary
reason for doing code reviews was
to discover bugs in code, only 15% of
the comments we found in code actually related to bugs. For example, we
would find comments about control-flow issues or use of the wrong API—
or even use of the right API but in
the wrong way. On the other hand, at
least half of the comments were about
maintainability. So, it would seem
that for the reviewers themselves,
identifying maintainability issues
proves to be more of a priority than
uncovering bugs.
LP: Now that your work has been out
there for a number of years, what sort
of impact have you seen on code-review
policies and practices across all the different development teams?
JC: One of our top goals was to reduce the amount of time required to
do a code review on average. We looked
to discover where it was that people
seemed to be spending an inordinate
amount of time, and that is what led
to the creation of a reviewer recommender. It’s such a simple thing, really, but it can be hard to find people
with the right experience if you are part
of a large team. Having an automated
system to identify those engineers
who have some familiarity with the file
where some changes have been made
can help cut down on the time required
to get those changes reviewed.
Something else we’ve done, quite
recently, is to give the developers a
way to explain what it was they were
trying to accomplish. This is because
a complaint we commonly hear from
reviewers is that it can be quite challenging to understand the reasoning
behind a code change. Which is to say
they would like some way to get into
the mindset of the person who made
that change so they can better understand whether it actually makes any
sense or not.
One way of dealing with this is to
show more than just the isolated section of code where a change has been
made. Instead, we show entire files
so reviewers can get a better sense of
the code around each change. We also
wanted to provide some means for the
author of a change to offer additional
information so reviewers could better
understand their reasoning. Toward
that end, our system now lets authors
put tags on files and regions to indicate which files are at the heart of a
change and so should probably be given particular attention. For example,
the tags can be used to quickly indicate which changes have been made
to test cases as opposed to the product
codes. Or they can be used to call out
certain files or changes with potential
security implications.
LP: Do you have any other new capabilities in the works?
JC: The fundamental underlying factor we’re trying to address is the size of
code reviews since that affects both the
time required to produce a review and
the usefulness of the comments that
come out of it. It’s a difficult problem
to address because some of the issues
are cultural in nature, and some relate