ure to track external dependencies may
also introduce bootstrapping risks. As
SaaS becomes more popular and as
more companies outsource infrastructure and functionality, cyclic dependencies may start to cross companies.
For example, if two storage companies
were to use each other’s systems to
store boot images, a disaster that affected both companies would make recovery difficult or impossible.
Directed Acyclic Graphs
At its essence, a service dependency is
the need for a piece of data that is remote to the service. It could be a configuration file stored in a file system,
or a row for user data in a database, or
a computation performed by the back
end. The way this remote data is accessed by the service may vary. For the
sake of simplicity, let’s assume all remote data or computation is provided
by a serving back end via remote procedure calls (RPCs).
As just described, dependency
cycles among systems can make it virtually impossible to recover after an
outage. The outage of a critical dependency propagates to its dependents,
so the natural place to begin restoring
the flow of data is the top of the dependency chain. With a dependency cycle,
however, there is no clear place to begin recovery efforts since every system
is dependent on another in the chain.
One way to identify cycles is to build
a dependency graph representing all
services in the system and all RPCs
exchanged among them. Begin building the graph by putting each service
on a node of the graph and drawing
directed edges to represent the outgoing RPCs. Once all services are placed
in the graph, the existing dependency
cycles can be identified using common
algorithms such as finding a topological
sorting via a depth-first search. If no cycles are found, that means the services’
dependencies can be represented by a
directed acyclic graph (DAG).
What happens when a cycle is
found? Sometimes, it’s possible to re-
move a cycle by inverting the depen-
dency, as shown in Figure 2. One exam-
ple is a notification system where the
senders notify the controllers about
new data, and the controller then pulls
data from the senders. The cycle here
can be easily removed by allowing the
to recover the same system. Without
this key information about the sys-
tem’s internals, the oncall engineer’s
response was significantly obstructed.
In the era of monolithic software
development, dependency management was relatively clear-cut. While a
monolithic binary may perform many
functions, it generally provides a single
failure domain containing all of the
binary’s functionality. Keeping track
of a small number of large binaries
and storage systems is not difficult, so
an owner of a monolithic architecture
can easily draw a dependency diagram,
perhaps like that in Figure 1.
The software industry’s move to-
ward the microservices model makes
dependency management much more
difficult. As Leslie Lamport said in
1987, “A distributed system is one in
which the failure of a computer you
didn’t even know existed can render
your own computer unusable.” 5 Large
binaries are now frequently broken
into many smaller services, each one
serving a single purpose and capable
of failing independently. A retail ap-
plication might have one service for
rendering the storefront, another for
thumbnails, and more for currency
conversion, checkout, address normal-
ization, and surveys. The dependencies
between them cross failure domains.
In her 2017 Velocity NY Conference
talk, Sarah Wells of the Financial Times
explained how her development teams
manage more than 150 microservices—and that is for just one part of
the Financial Times’s technical estate.
Squarespace is in the process of breaking down its monolith4 and already has
more than 30 microservices. Larger
companies such as Google, Netflix,
and Twitter often have thousands of
microservices, pushing the problem of
dependency management beyond human capabilities.
Microservices offer many advantages. They allow independent component releases, smoother rollbacks, and
polyglot development, as well as allow
teams to specialize in one area of the
codebase. However, they are not easy to
keep track of. In a company with more
than 100 microservices, it is unlikely
that employees could draw a diagram
and get it right, or guarantee they are
making dependency decisions that will
not result in a cycle.
Both monolithic services and
microservices can experience bootstrapping issues caused by hidden
dependencies. They rely on access to
decryption keys, network, and power.
They may also depend on external
systems such as DNS (Domain Name
System). If individual endpoints of a
monolith are reached via DNS, the process of keeping those DNS records up
to date may create a cycle.
The adoption of SaaS (software as
a service) creates new dependencies
whose implementation details are hidden. These dependencies are subject
to the latency, SLO, testing, and security concerns mentioned previously. Fail-
Figure 1. Sample dependency diagram.
Figure 2. Cycle removal.
new data is