observing the behavior of a system,
but preventing dependency problems
before they reach production requires
a more active strategy. Implementing
dependency control ensures each new
dependency can be added to a DAG
before it enters use. This gives system
designers the freedom to add new dependencies where they are valuable,
while eliminating much of the risk that
comes from the uncontrolled growth
The Hidden Dividends of Microservices
A Conversation with Werner Vogels
Fail at Scale
1. Beyer, B., Jones, C., Petoff, J., Murphy, N. R. (Eds.). Site
Reliability Engineering: How Google Runs Production
Systems. O’Reilly Media, 2016, 37–40.
2. Beyer, B., Jones, C., Petoff, J., Murphy, N. R. (Eds.). Site
Reliability Engineering: How Google Runs Production
Systems. Chapter 25: Data processing pipelines.
O’Reilly Media, 2016.
3. Chang, F. et al. Bigtable: A distributed storage
system for structured data, 2006; https://static.
4. Kachouh, R. The pillars of Squarespace services.
Squarespace Engineering; https://engineering.
5. Lamport, L. Email message sent to a DEC SRC
bulletin board, 1987; https://www.microsoft.com/en-us/research/publication/distribution/.
6. Saini, A. How much do bugs cost to fix during each
phase of the SDLC? Synopsis, 2017; https://www.
7. Seaton, N. Why fidelity of environments throughout
your testing process is important. Electric Cloud;
8. Treynor, B., Dahlin, M., Rau, V. and Beyer, B. The
calculus of service availability. acmqueue 15, 2 2017);
9. Ward, R. and Beyer, B. BeyondCorp: A new approach to
enterprise security. ;login: 39, 6 (2014), 6–11; https://
Silvia Esparrachiari Ghirotti has been at Google for eight
years, working in the areas of social products, user data
privacy, and fighting abuse. She currently leads the team
developing tools for dependency control.
Tanya Reilly is the principal engineer for infrastructure
at Squarespace. She previously spent 12 years improving
the resilience of low-level services at Google, including
introducing a layered model for dependency control.
Ashleigh Rentz is a technical writer whose interests
include blameless post mortems and wearable
technology. She spent 14 years at Google, most recently
producing internal documentation for SRE and Google
Copyright held by owners/authors.
Publication rights licensed to ACM. $15.00.
to exchange RPCs with each other.
Thus, they are not allowed to depend
on each other.
One way to generalize dependency
authorization in a DAG model is to let
oriented edges represent can-send-to
relations. Each node on the graph has
a self-referencing edge (that is, they
can send RPCs to themselves). Also,
the can-send-to relation is transitive: if
A can send RPCs to B, and B can send
RPCs to C, then A can send RPCs to C.
Note that if B can send RPCs to A, and B
can send RPCs to C, that does not imply
that A can send RPCs to C or vice versa.
Can-send-to is a directed relation. If
there were a can-send-to relation in
both directions (from A to B and from B
to A), this would constitute a cycle and
the model wouldn’t be a DAG.
Figure 6 shows how the pseudocode
for authorizing RPCs in a DAG model
could be written.
The isolated model can be combined with the layered model, allowing
the isolated bootstrap of each region to
be reinforced, as illustrated in Figure 7.
Figure 8 shows the pseudocode for
combining different models.
Be careful when combining models
that you do not isolate critical compo-
nents by combining mutually exclusive
models. Usually, simple models are
easier to understand and to predict
the results of combining, like the lay-
ered and isolated models described
here. It can be challenging to predict
the combined logic for two or more
complex models. For example, sup-
pose there are two models based on
the geographical locality of machines.
It’s straightforward to see that assign-
ing locality “Tokyo” from one model
and locality “London” from the other
model will result in an empty set, since
no machine can be physically located
in London and Tokyo at the same time.
Meanwhile, if there are two tree mod-
els based on locality—such as one for
city, time zone, and country, and an-
other for metro, voting zone, and coun-
try—it might be difficult to verify which
combinations of values will return
With the growth of massive interde-
pendent software systems, dependency
management is a crucial part of system
and software design. Most organiza-
tions will benefit from tracking exist-
ing dependencies to help model their
latency, SLOs, and security threats.
Many will also find it useful to limit the
growth of new dependencies for data
integrity and to reduce the risk of out-
ages. Modeling infrastructure as a DAG
will make it easier to be certain there
are no dependencies that will prevent
isolated bootstrapping of a system.
Dependencies can be tracked by
Figure 6. Pseudocode for model authorization.
func isAllowedByModel(rpc, model):
clientNode = model.resolveNode( rpc.sender)
serviceNode = model.resolveNode( rpc.receiver)
return model.hasTransitiveConnection(clientNode, serviceNode)
Figure 8. Pseudocode for multiple model authorization.
foreach model in modelCollection:
Checks if the RPC is allowed by the model.
if !isAllowedByModel(rpc, model):
If no models reject the RPC, then it should be allowed.
Figure 7. Combined model.
privacy privacy privacy
storage storage storage
security security security
network network network