over binaries developed in the same building. Even so, the
actual effects are relatively minor (4% and 6%, respectively).
Variable Increase standard error
diff_localities 1. 23
model 4. F statistic = 242.73, p < .0005
p < .0005
p = .493
p = .016
p = .019
p = .457
p = . 101
p < .0005
Two important observations can be made from these
models. The first is that the variance explained by the predictor variables as measured in the adjusted R2 value (not
shown) for the built models rises from 2% and 4% (models
1 and 3) to 33% (models 2 and 4) when adding the number of
developers. The second is that when controlling for the number of developers, not all levels of distribution show a significant effect, but the increase in post-release failures for those
that do is minimal with values at or below 6%. To put this into
perspective, a binary with 4 failures if collocated would have
4. 24 failures if distributed. Although our response variable is
different from Herbsleb and Mockus, our findings are consistent with their result that when controlling for the number
of people working on a development task, distribution does
not have a large effect. Based on these results, we are unable
to reject the null hypothesis and H1 is not confirmed.
This leads to the surprising conclusion that in the context
in which Windows Vista was developed, teams that were distributed wrote code that had virtually the same number of
post-release failures as those that were collocated.
4. 3. Differences in binaries
One possible explanation for this lack of difference in failures could be that distributed binaries are smaller, less complex, have fewer dependencies, etc. Although the number of
failures changes only minimally when the binaries are distributed, we are interested in the differences in characteristics between distributed and collocated binaries. This was
done to determine if informed decisions were made about
which binaries should be developed in a distributed manner. For instance, prior work has shown that the number of
failures is highly correlated with code complexity and number of dependencies.
17, 24 Therefore, it is possible that only
less complex binaries or those with less dependents were
chosen for distribution in an effort to mitigate the perceived
dangers of distributed development.
We gathered metrics for each of the binaries in an attempt
to determine if there is a difference in the nature of binaries that are distributed. These measures fall into five broad
size and complexity: Our code size and complexity measures include number of independent paths through the
code, number of functions, classes, parameters, blocks,
lines, local and global variables, and cyclomatic complexity.
From the call graph we extract the fan in and fan out of each
function. For object oriented code we include measures
of class coupling, inheritance depth, the number of base
classes, subclasses and class methods, and the number of
public, protected, and private data members and methods.
All of these are measured as totals for the whole binary and
as maximums on a function or class basis as applicable.
code churn: As measures of code churn we examine the change
in size of the binary, the frequency of edits and the churn size
in terms of lines removed, added, and modified from the beginning of Vista development until release to manufacturing.
test coverage: The number of blocks and arcs as well as the
block coverage and arc coverage are recorded during the
testing cycle for each binary.
Dependencies: Many binaries have dependencies on one
another (in the form of method calls, data types, registry values that are read or written, etc.). We calculate the number
of direct incoming and outgoing dependencies as well as the
transitive closure of these dependencies. The depth in the
dependency graph is also recorded.
people: We include a number of statistics on the people and
organizations that worked on the binaries. These include
all of the metrics in our prior organizational metrics paper18
such as the number of engineers that worked on the binary.
We began with a manual inspection of the 20 binaries
with the least and 20 binaries with the most number of post-release failures in both the distributed and collocated categories and examined the values of the metrics described above.
The only discernible differences were metrics relative to the
number of people working on the code, such as team size.
number of Devs
p < .0005
p < .0005
p = .033
p < .0005
p = .363
p < .0005
p < .0005
We evaluated the effect of these metrics on level of distribution in the entire population by examining the spearman
rank correlation of distribution level of binaries (not limited
to the “top 20” lists) with the code metrics. Most metrics
had correlation levels below 0.1 and the few that were above
that level, such as number of engineers, never exceeded
0.25. Logistic regression was used to examine the relationship of the development metrics with distribution level. The
increase in classification accuracy between a naive model
including no independent variables and a stepwise refined
model with 15 variables was only 4%. When removing data
related to people that worked on the source, the refined
model’s accuracy only improved 2.7% from the naive model.
We include the average values for a representative sample
of the metrics along with a spearman rank correlation with
the level of distribution for the binaries and the significance
of the correlation. Although the p-values are quite low, the
magnitude of the correlation is small. This is attributable to
the very large sample of binaries (over 3,000).