source-control implementation for
hosting the central repository, as
discussed later. Following this transition, automated commits to the repository began to increase. Growth in
the commit rate continues primarily
due to automation.
Managing this scale of repository
and activity on it has been an ongoing
challenge for Google. Despite several
years of experimentation, Google
was not able to find a commercially available or open source version-control system to support such scale
in a single repository. The Google
proprietary system that was built to
store, version, and vend this codebase
is code-named Piper.
Background
Before reviewing the advantages
and disadvantages of working with
a monolithic repository, some background on Google’s tooling and workflows is needed.
Piper and CitC. Piper stores a single
large repository and is implemented on top of standard Google infrastructure, originally Bigtable, 2 now
Spanner. 3 Piper is distributed over
10 Google data centers around the
world, relying on the Paxos6 algorithm
to guarantee consistency across replicas. This architecture provides a
high level of redundancy and helps
optimize latency for Google software developers, no matter where
they work. In addition, caching and
asynchronous operations hide much
of the network latency from developers. This is important because gaining the full benefit of Google’s cloud-based toolchain requires developers
to be online.
Google relied on one primary Perforce
instance, hosted on a single machine,
coupled with custom caching infrastructure1 for more than 10 years prior to the
launch of Piper. Continued scaling of
Figure 1. Millions of changes committed to Google’s central repository over time.
Jan. 2000 Jan. 2005 Jan. 2010 Jan. 2015
10 M
20 M
30 M
40 M
Figure 2. Human committers per week.
Jan. 2010 Jan. 2011 Jan. 2012 Jan. 2013 Jan. 2014 Jan. 2015
5,000
10,000
15,000
Unique human users per week
Figure 3. Commits per week.
Jan. 2010 Jan. 2011 Jan. 2012 Jan. 2013 Jan. 2014 Jan. 2015
75,000
150,000
225,000
Human commits Total commits
300,000
Google repository statistics, January 2015.
Total number of files 1 billion
Number of source files 9 million
Lines of source code 2 billion
Depth of history 35 million commits
Size of content 86TB
Commits per workday 40,000