SOFTWARE PROJECTS TODAY are getting more and
more complex. Code accumulates over the years as
organization growth increases the volume of daily
commits. Projects that used to take minutes to
complete a full build now start with fetching from the
repository and may require an hour or more to build.
A developer who maintains the infrastructure
constantly has to add more machines to support the
ever-increasing workload for builds and tests, at the
same time facing pressure from users who are unhappy
with the long submit time. Running more parallel
jobs helps, but this is limited by the number of cores
on the machine and the parallelizability of the build.
Incremental builds certainly help, but might not apply if
clean builds are needed for production releases. Having
many build machines also increases maintenance.
Bazel (https://bazel.build/) provides the power to run
How Does It Work?
build tasks remotely and massively parallel. Not every
organization, however, can afford to have an in-house
remote execution farm. For most proj-
ects a remote cache is a great way to
boost performance for build and test by
sharing build outputs and test outputs
among build workers and workstations.
This article details the remote cache fea-
ture in Bazel ( https://docs.bazel.build/
and examines options for building your
own remote cache service. In practice,
this can reduce the build time by almost
an order of magnitude.
Users run Bazel ( https://docs.bazel.
by specifying targets to build or test. Bazel determines the dependency graph of
actions to fulfill the targets after analyzing the build rules. This process is incremental, as Bazel will skip the already
completed actions from the last invocation in the workspace directory. After
that, it goes into the execution phase
and executes actions according to the
dependency graph. This is when the
remote cache and execution systems
come into play.
An action in Bazel consists of a command, arguments to the command, and
the environment variables, as well as
lists of input files and output files. It also
contains the description of the platform
for remote execution, which is outside
the scope of this article. The information about an action can be encoded
into a protocol buffer (
works as a fingerprint of the action. It
contains the command, arguments, and
environment variables combined as a
digest and a Merkle tree digest from the
input files. The Merkle tree is generated
as follows: files are the leaf nodes and
are digested using their corresponding
content; directories are the tree nodes
and are digested using digests from
their subdirectories and children files.
Bazel uses SHA-256 as the default hash
function to compute the digests.
Before executing an action, Bazel
constructs the protocol buffer using the
process described here. The buffer is
then digested to look up the remote ac-
Article development led by
Save time by sharing and
reusing build and test output.
BY ALPHA LAM