Table 1. Locations, code size, and defect density from motorola’s
3G trial project for each site.
arlington Heights, uSa
arlington Heights, uSa
Nagappan et al. investigated the influence of organizational structure on software quality in Windows Vista.
found a strong relationship between how development is
distributed across the organizational structure and number
of post-release failures in binaries shipped with the operating system. Along with other organizational measures, they
measured the level of code ownership by the organization
that the binary owner belonged to, the number of organizations that contributed at least 10% to the binary, and the
organizational level of the person whose reporting engineers
perform more than 75% of the edits. Our paper complements this study by examining geographically, rather than
organizationally distributed development.
4. me ThoDs AnD AnALYsIs
In this section, we describe our methods of gathering data for
our study and the analysis methods used to evaluate our hypotheses regarding distributed development in Windows Vista.
4. 1. Data collection
Windows Vista is a large commercial software project involving a few thousand developers. It comprises thousands of
binaries (defined as individual files containing machine code
such as executables or a libraries) with a source code base
of tens of millions LOC. Developers were distributed across
59 buildings and 21 campuses in Asia, Europe, and North
America. Vista was developed completely in-house without
any outsourced elements.
Our data focuses on three properties: code quality, geographical location, and code ownership. Our measure of code
quality is post-release failures, since these matter most to
end-users, cost the most to fix, and affect product and company reputation. These failures are recorded for the 6 months
following the release of Vista at the binary level.
The geographical location of each software developer at
Microsoft is obtained from the people management software
at the time of release to manufacturing of Vista. This data
includes the building, campus, region, country, and continent
information. While some developers occasionally move, it is
standard practice at Microsoft to keep a software engineer at
one location during an entire product cycle. Most of the developers of Vista didn’t move during the observation period.
Finally we gathered the number of commits made by each
engineer to each binary. We remove build engineers from the
88 communIcATIons of The Acm | auGust 2009 | Vol. 52 | no. 8
analysis because their changes are broad, but not substantive. Many files have fields that need to be updated prior to a
build, but the actual source code is not modified. By combining this data with developer geographical data, we determine
the level of distribution of each binary and categorize these
levels into a hierarchy. Microsoft practices a strong code ownership development process. We found that on average, 49%
of the commits for a particular binary can be attributed to one
engineer. Although we are basing our analysis on data from
the development phase, in most cases, this is indicative of the
distribution that was present during the design phase as well.
We categorized the distribution of binaries into the following geographic levels. Our reasoning behind this classification is explained below.
Building: Developers who work in the same building (and
often the same floor) will enjoy more face to face and informal contact. A binary classified at the building level may
have been worked on by developers on different floors of the
cafeteria: Several buildings share a cafeteria. One cafeteria
services between one and five nearby buildings. Developers
in different, but nearby buildings, may “share meals”
together or meet by chance during meal times. In addition, the typically shorter geographical distance facilitates
campus: A campus represents a group of buildings in one
location. For instance, in the United States, there are multiple campuses. Some campuses are located in the same city.
It is easy to travel between buildings on the same campus by
foot while travel between campuses (even in the same city)
requires a vehicle.
locality: We use localities to represent groups of geographically proximate campuses. For instance, the Seattle locality
contains all of the campuses in western Washington. One
can travel within a locality by car on day trips, but travel
between localities often requires air travel and multi-day
trips. Also, all sites in a particular locality operate in the
same time zone, making coordination and communication
within a locality easier than between localities.
continent: All of the locations on a given continent fall into
this category. We choose to group at the continent level
rather than the country level because Microsoft has offices
in Vancouver Canada and we wanted those to be grouped
together with other west coast sites (Seattle to Vancouver is
less than 3 h by road). If developers are located in the same
continent, but not the same locality, then it is likely that cultural similarities exists, but they operate in different time
zones and rarely speak face to face.
World: Binaries developed by engineers on different continents are placed in this category. This level of geographical
distribution means that face to face meetings are rare and
synchronous communication such as phone calls or online
chats are hindered by time differences. Also, cultural and
language differences are more likely.
For every level of geographical dispersion there are more
than two entities from the lower level within that level. That
is, Vista was developed in more that three continents, localities, etc. Each binary is assigned the lowest level in the hierarchy from which at least 75% of the commits were made. Thus,