some mappings to determine which
ones were thought to be the most interesting or useful. It turns out that
generally proved to be comments that
identified functional issues, pointed
out missing validation checks, or offered suggestions related to API usage
or best practices.
LP: Just for context, can you also
speak to the scale of this research—the
size of the codebase you were working
with, the number of code reviews you
analyzed, or the number of developers
who were involved?
CB: We did a number of different
studies, many of which were more
quantitative than observational. In one
case, we did an initial study where it
became clear that the depth of knowledge someone has of a certain piece
of code will definitely show up in the
quality of feedback they are able to offer as a reviewer. Which is to say, to get
higher-quality comments, you need
reviews from people who have some
experience with that particular piece of
software. Then, to check out that conclusion, we spoke with and observed
some engineers who had submitted reviews for code already familiar to them.
We also observed some engineers who
had been asked to review code they had
no prior experience with. That was a
small study, but it left us with some
definite impressions.
There also were those studies Mi-
chaela just mentioned, where we con-
sidered comment usefulness. That
was based on data gathered from
across all of Microsoft and then fed
into a machine-learning classifier we
had built to categorize code reviews.
We ended up using that to classify
three million reviews of code that had
been written by tens of thousands
of developers and drawn from every
codebase across the whole of Micro-
soft—meaning we are easily talking
about hundreds of millions of lines of
code. Obviously, the quantitative data
analysis we were able to perform there
was based on a substantial amount of
data. The qualitative observational
studies, on the other hand, were typi-
cally much smaller.
MG: We definitely had a tremendous amount of data available—
essentially all the code written for
Office, Windows, Windows Phone,
Azure, and Visual Studio, as well as
many smaller projects.
JC: We also enjoy an advantage here
at Microsoft in that we have so many
different product types. We look at the
work people do on operating systems,
as well as apps and large-scale services
and small-scale services and everything in between. We are very aware of
the different demands in each of these
areas, and we make a point of keeping
that in mind as we do our studies.
LP: In those cases where you could
derive data from the use of CodeFlow,
were you also able to further instrument the tool to augment your studies?
JC: One of the most interesting
things to surface from instrumenting CodeFlow was just how much
time people were actively spending
in the review tool. That’s because
LUCAS PANJER
Are you saying
that after you have
created these
tools for research
purposes, other
teams will go on
to use them to
reflect on their own
processes?