ing it to C++ 11 or rolling out performance optimizations9) are often managed centrally by dedicated codebase
maintainers. Such efforts can touch
half a million variable declarations or
function-call sites spread across hundreds of thousands of files of source
code. Because all projects are centrally stored, teams of specialists can do
this work for the entire company, rather than require many individuals to
develop their own tools, techniques,
or expertise.
As an example of how these benefits play out, consider Google’s Compiler team, which ensures developers
at Google employ the most up-to-date
toolchains and benefit from the latest improvements in generated code
and “debuggability.” The monolithic
repository provides the team with
full visibility of how various languages are used at Google and allows them
to do codebase-wide cleanups to prevent changes from breaking builds or
creating issues for developers. This
greatly simplifies compiler validation,
thus reducing compiler release cycles
and making it possible for Google to
safely do regular compiler releases
(typically more than 20 per year for the
C++ compilers).
Using the data generated by performance and regression tests run on
nightly builds of the entire Google
codebase, the Compiler team tunes default compiler settings to be optimal.
For example, due to this centralized
effort, Google’s Java developers all saw
their garbage collection (GC) CPU consumption decrease by more than 50%
and their GC pause time decrease by
10%–40% from 2014 to 2015. In addition, when software errors are discovered, it is often possible for the team
to add new warnings to prevent reoccurrence. In conjunction with this
change, they scan the entire repository to find and fix other instances of
the software issue being addressed,
before turning to new compiler errors. Having the compiler-reject patterns that proved problematic in the
past is a significant boost to Google’s
overall code health.
Storing all source code in a common
version-control repository allows codebase maintainers to efficiently analyze and change Google’s source code.
Tools like Refaster11 and ClangMR15
(often used in conjunction with Rosie)
make use of the monolithic view of
Google’s source to perform high-level
transformations of source code. The
monolithic codebase captures all dependency information. Old APIs can
be removed with confidence, because
it can be proven that all callers have
been migrated to new APIs. A single
common repository vastly simplifies
these tools by ensuring atomicity of
changes and a single global view of
the entire repository at any given time.
Costs and trade-offs. While important to note a monolithic codebase in
no way implies monolithic software design, working with this model involves
some downsides, as well as trade-offs,
that must be considered.
These costs and trade-offs fall into
three categories:
˲ Tooling investments for both development and execution;
˲Codebase complexity, including
unnecessary dependencies and difficulties with code discovery; and
˲ Effort invested in code health.
In many ways the monolithic repository yields simpler tooling since there
is only one system of reference for
tools working with source. However, it
is also necessary that tooling scale to
the size of the repository. For instance,
Google has written a custom plug-in for
the Eclipse integrated development
environment (IDE) to make working with a massive codebase possible
from the IDE. Google’s code-indexing
system supports static analysis, cross-referencing in the code-browsing tool,
and rich IDE functionality for Emacs,
Vim, and other development environments. These tools require ongoing investment to manage the ever-increas-ing scale of the Google codebase.
Beyond the investment in building and maintaining scalable tooling,
Google must also cover the cost of running these systems, some of which are
very computationally intensive. Much
of Google’s internal suite of developer tools, including the automated
test infrastructure and highly scalable
build infrastructure, are critical for
supporting the size of the monolithic
codebase. It is thus necessary to make
trade-offs concerning how frequently
to run this tooling to balance the cost
of execution vs. the benefit of the data
provided to developers.
An important aspect
of Google culture
that encourages
code quality is
the expectation
that all code is
reviewed before
being committed
to the repository.