Dependency-refactoring and cleanup tools are helpful, but, ideally, code
owners should be able to prevent unwanted dependencies from being created in the first place. In 2011, Google
started relying on the concept of
API visibility, setting the default
visibility of new APIs to “private.”
This forces developers to explicitly
mark APIs as appropriate for use by
other teams. A lesson learned from
Google’s experience with a large
monolithic repository is such mechanisms should be put in place as soon
as possible to encourage more hygienic
dependency structures.
The fact that most Google code is
available to all Google developers has
led to a culture where some teams expect other developers to read their
code rather than providing them with
separate user documentation. There
are pros and cons to this approach. No
effort goes toward writing or keeping
documentation up to date, but developers sometimes read more than the
API code and end up relying on underlying implementation details. This behavior can create a maintenance burden for teams that then have trouble
deprecating features they never meant
to expose to users.
This model also requires teams to
collaborate with one another when using open source code. An area of the
repository is reserved for storing open
source code (developed at Google or
externally). To prevent dependency
conflicts, as outlined earlier, it is important that only one version of an
open source project be available at
any given time. Teams that use open
source software are expected to occasionally spend time upgrading their
codebase to work with newer versions
of open source libraries when library
upgrades are performed.
Google invests significant effort in
maintaining code health to address
some issues related to codebase com-
plexity and dependency manage-
ment. For instance, special tooling
automatically detects and removes
dead code, splits large refactorings
and automatically assigns code re-
views (as through Rosie), and marks
APIs as deprecated. Human effort is
required to run these tools and man-
age the corresponding large-scale
code changes. A cost is also incurred
The monolithic model makes it
easier to understand the structure of
the codebase, as there is no crossing of
repository boundaries between depen-
dencies. However, as the scale increas-
es, code discovery can become more
difficult, as standard tools like grep
bog down. Developers must be able
to explore the codebase, find relevant
libraries, and see how to use them
and who wrote them. Library authors
often need to see how their APIs are
being used. This requires a signifi-
cant investment in code search and
browsing tools. However, Google has
found this investment highly reward-
ing, improving the productivity of all
developers, as described in more detail
by Sadowski et al. 9
Access to the whole codebase encourages extensive code sharing and
reuse. Some would argue this model,
which relies on the extreme scalability of the Google build system, makes
it too easy to add dependencies and
reduces the incentive for software developers to produce stable and well-thought-out APIs.
Due to the ease of creating dependencies, it is common for teams to not think
about their dependency graph, making
code cleanup more error-prone. Unnecessary dependencies can increase
project exposure to downstream build
breakages, lead to binary size bloating,
and create additional work in building
and testing. In addition, lost productivity ensues when abandoned projects
that remain in the repository continue
to be updated and maintained.
Several efforts at Google have
sought to rein in unnecessary dependencies. Tooling exists to help identify
and remove unused dependencies, or
dependencies linked into the product binary for historical or accidental
reasons, that are not needed. Tooling
also exists to identify underutilized
dependencies, or dependencies on
large libraries that are mostly unneeded, as candidates for refactoring. 7 One
such tool, Clipper, relies on a custom
Java compiler to generate an accurate
cross-reference index. It then uses the
index to construct a reachability graph
and determine what classes are never
used. Clipper is useful in guiding dependency-refactoring efforts by finding
targets that are relatively easy to remove
or break up.
A developer can
make a major
change touching
hundreds or
thousands of
files across the
repository in a
single consistent
operation.