Does Distributed Development
DoI: 10.1145/1536616.1536639
Affect Software Quality?
An Empirical Case Study
of Windows Vista
By Christian Bird, Nachiappan Nagappan, Premkumar Devanbu, harald Gall, and Brendan Murphy
Abstract
Existing literature on distributed development in software
engineering, and other fields discusses various challenges,
including cultural barriers, expertise transfer difficulties, and
communication and coordination overhead. Conventional
wisdom, in fact, holds that distributed software development
is riskier and more challenging than collocated development.
We revisit this belief, empirically studying the overall development of Windows Vista and comparing the post-release
failures of components that were developed in a distributed
fashion with those that were developed by collocated teams.
We found a negligible difference in failures. This difference
becomes even less significant when controlling for the number of developers working on a binary. Furthermore, we also
found that component characteristics (such as code churn,
complexity, dependency information, and test code coverage) differ very little between distributed and collocated
components. Finally, we examine the software process used
during the Vista development cycle and examine how it may
have mitigated some of the difficulties of distributed development introduced in prior work in this area.
1. In TRoDuc TIon
Globally distributed software development is an increasingly
common strategic response to issues such as skill set availability, acquisitions, government restrictions, increased code
size, cost and complexity, and other resource constraints.
4, 9
In this paper, we examine development that is globally distributed, but completely within Microsoft. This style of global
development within a single company is to be contrasted with
outsourcing, which involves multiple companies. It is widely
believed that distributed collaboration involves challenges
not inherent in collocated teams, including delayed feedback,
restricted communication, less shared project awareness, difficulty of synchronous communication, inconsistent development and build environments, and lack of trust and confidence
between sites.
20 While there are studies that have examined the
delay associated with distributed development and the direct
causes for them,
11 there has been much less attention (see e.g.,
Rammasubbu and Balan21) to the effect of distributed development on software quality in terms of post-release failures.
In this paper, we use historical development data from the
implementation of Windows Vista, along with post-release
failure information, to empirically evaluate the hypothesis
that globally distributed software development leads to more
failures. We focus on post-release failures at the level of individual executables and libraries (which we refer to as binaries)
shipped as part of the operating system and use the IEEE definition of a failure as “the inability of a system or component to
perform its required functions within specified performance
requirements.” Post-release failures are the most costly to
companies in terms of reputation and marketshare.
Using geographical and commit data for the developers that worked on Vista, we divide the Vista binaries into
those developed by (a) distributed and (b) collocated teams;
we then examine the distribution of post-release failures in
both populations. Binaries are classified as developed in
a distributed manner if at least 25% of the commits came
from locations other than where binary’s owner resides.
We find that there is a small (around 10%) increase in the
number of failures of binaries written by distributed teams
(hereafter referred to as distributed binaries) over those
written by collocated teams (collocated binaries). However,
when controlling for team size, the difference becomes negligible. In order to see if only smaller, less complex, or less
critical binaries are chosen for distributed development
(which could explain why distributed binaries have approximately the same number of failures), we examined many relevant properties of these binaries, but found no difference
between distributed and collocated binaries. We present
our methods and findings in this paper.
2. mo TIVATIon AnD con TRIBu TIons
Distributed software development is a general concept that
can be operationalized in various ways. Development may
be distributed along many dimensions with various distinctive characteristics.
8 There are key questions that should
be clarified when discussing a distributed software project.
Who or what is distributed and at what level? Are people or
the artifacts distributed? Are people dispersed individually
or dispersed in groups?
It is important to consider the way that developers and
other entities are distributed. The distribution can be across
A previous version of this article appeared in Proceedings
of the 31st International Conference on Software Engineering (May 2009).