by teams that need to review an ongoing stream of simple refactorings resulting from codebase-wide clean-ups
and centralized modernization efforts.
Alternatives
As the popularity and use of distributed version control systems (DVCSs)
like Git have grown, Google has considered whether to move from Piper
to Git as its primary version-control
system. A team at Google is focused
on supporting Git, which is used by
Google’s Android and Chrome teams
outside the main Google repository.
The use of Git is important for these
teams due to external partner and open
source collaborations.
The Git community strongly suggests and prefers developers have
more and smaller repositories. A Git-clone operation requires copying all
content to one’s local machine, a procedure incompatible with a large repository. To move to Git-based source
hosting, it would be necessary to split
Google’s repository into thousands of
separate repositories to achieve reasonable performance. Such reorganization
would necessitate cultural and workflow changes for Google’s developers.
As a comparison, Google’s Git-hosted
Android codebase is divided into more
than 800 separate repositories.
Given the value gained from the existing tools Google has built and the
many advantages of the monolithic
codebase structure, it is clear that moving to more and smaller repositories
would not make sense for Google’s
main repository. The alternative of
moving to Git or any other DVCS that
would require repository splitting is
not compelling for Google.
Current investment by the Google
source team focuses primarily on the
ongoing reliability, scalability, and
security of the in-house source sys-
tems. The team is also pursuing an
experimental effort with Mercurial,g
an open source DVCS similar to Git.
The goal is to add scalability fea-
tures to the Mercurial client so it can
efficiently support a codebase the
size of Google’s. This would provide
Google’s developers with an alterna-
tive of using popular DVCS-style work-
flows in conjunction with the central
g http://mercurial.selenic.com/
Tech Leads of CitC; Hyrum Wright,
Google’s large-scale refactoring guru;
and Chris Colohan, Caitlin Sadowski,
Morgan Ames, Rob Siemborski, and
the Piper and CitC development and
support teams for their insightful re-
view comments.
References
1. Bloch, D. Still All on One Server: Perforce at Scale.
Google White Paper, 2011; http://info.perforce.
com/rs/perforce/images/GoogleWhitePaper-StillAllonOneServer-PerforceatScale.pdf
2. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C.,
Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., and
Gruber, R.E. Bigtable: A distributed storage system
for structured data. ACM Transactions on Computer
Systems 26, 2 (June 2008).
3. Corbett, J.C., Dean, J., Epstein, M., Fikes, A., Frost,
C., Furman, J., Ghemawat, S., Gubarev, A., Heiser,
C., Hochschild, P. et al. Spanner: Google’s globally
distributed database. ACM Transactions on Computer
Systems 31, 3 (Aug. 2013).
4. Gabriel, R.P., Northrop, L., Schmidt, D.C., and Sullivan,
K. Ultra-large-scale systems. In Companion to the
21st ACM SIGPLAN Symposium on Object-Oriented
Programming Systems, Languages, and Applications
(Portland, OR, Oct. 22–26). ACM Press, New York,
2006, 632–634.
5. Kemper, C. Build in the Cloud: How the Build System
works. Google Engineering Tools blog post, 2011;
http://google-engtools.blogspot.com/2011/08/build-
in-cloud-how-build-system-works.html
6. Lamport, L. Paxos made simple. ACM Sigact News 32,
4 (Nov. 2001), 18–25.
7. Morgenthaler, J.D., Gridnev, M., Sauciuc, R., and
Bhansali, S. Searching for build debt: Experiences
managing technical debt at Google. In Proceedings
of the Third International Workshop on Managing
Technical Debt (Zürich, Switzerland, June 2–9). IEEE
Press Piscataway, NJ, 2012, 1–6.
8. Ren, G., Tune, E., Moseley, T., Shi, Y., Rus, S., and
Hundt, R. Google-wide profiling: A continuous profiling
infrastructure for data centers. IEEE Micro 30, 4
(2010), 65–79.
9. Sadowski, C., Stolee, K., and Elbaum, S. How
developers search for code: A case study. In
Proceedings of the 10th Joint Meeting on Foundations
of Software Engineering (Bergamo, Italy, Aug. 30–
Sept. 4). ACM Press, New York, 2015, 191–201.
10. Sadowski, C., van Gogh, J., Jaspan, C., Soederberg, E.,
and Winter, C. Tricorder: Building a program analysis
ecosystem. In Proceedings of the 37th International
Conference on Software Engineering, Vol. 1 (Firenze,
Italy, May 16–24). IEEE Press Piscataway, NJ, 2015,
598–608.
11. Wasserman, L. Scalable, example-based refactorings
with Refaster. In Proceedings of the 2013 ACM
Workshop on Refactoring Tools (Indianapolis, IN, Oct.
26–31). ACM Press, New York, 2013, 25–28.
12. Wikipedia. Dependency hell. Accessed Jan.
20, 2015; http://en.wikipedia.org/w/index.
php?title=Dependency_hell&oldid=634636715
13. Wikipedia. Filesystem in userspace.
Accessed June, 4, 2015; http://en.wikipedia.
org/w/ index.php?title=Filesystem_in_
Userspace&oldid=664776514
14. Wikipedia. Linux kernel. Accessed Jan. 20, 2015;
http://en.wikipedia.org/w/index.php?title=Linux_
kernel&oldid=643170399
15. Wright, H.K., Jasper, D., Klimek, M., Carruth, C., and
Wan, Z. Large-scale automated refactoring using
ClangMR. In Proceedings of the IEEE International
Conference on Software Maintenance (Eindhoven,
The Netherlands, Sept. 22–28). IEEE Press, 2013,
548–551.
Rachel Potvin ( rpotvin@google.com) is an engineering
manager at Google, Mountain View, CA.
Josh Levenberg ( joshl@google.com) is a software
engineer at Google, Mountain View, CA.
Copyright held by the authors
repository. This effort is in collaboration with the open source Mercurial
community, including contributors
from other companies that value the
monolithic source model.
Conclusion
Google chose the monolithic-source-management strategy in 1999 when
the existing Google codebase was
migrated from CVS to Perforce. Early
Google engineers maintained that a
single repository was strictly better
than splitting up the codebase, though
at the time they did not anticipate the
future scale of the codebase and all
the supporting tooling that would be
built to make the scaling feasible.
Over the years, as the investment required to continue scaling the centralized repository grew, Google leadership occasionally considered whether
it would make sense to move from the
monolithic model. Despite the effort
required, Google repeatedly chose to
stick with the central repository due to
its advantages.
The monolithic model of source
code management is not for everyone.
It is best suited to organizations like
Google, with an open and collaborative culture. It would not work well
for organizations where large parts
of the codebase are private or hidden
between groups.
At Google, we have found, with some
investment, the monolithic model of
source management can scale successfully to a codebase with more than one
billion files, 35 million commits, and
thousands of users around the globe. As
the scale and complexity of projects both
inside and outside Google continue to
grow, we hope the analysis and workflow
described in this article can benefit others weighing decisions on the long-term
structure for their codebases.
Acknowledgments
We would like to recognize all current
and former members of the Google
Developer Infrastructure teams for
their dedication in building and
maintaining the systems referenced
in this article, as well as the many
people who helped in reviewing the
article; in particular: Jon Perkins and
Ingo Walther, the current Tech Leads
of Piper; Kyle Lippincott and Crutcher
Dunnavant, the current and former