everything is working perfectly, however, the communication cost of constant
coordination might simply be too much.
Giving up strict coordination in
exchange for performance is a well-known trade-off in many areas of computing, including CPU architecture,
multithreaded programming, and
DBMS design. Quite often this has
turned into a game of finding out just
how little coordination is really needed
to provide unsurprising behavior to users. While designers of some concur-rent-but-local systems have developed
quite a collection of tricks for managing just enough coordination (for
example, the RDBMS field has a long
history of interesting performance
hacks, often resulting in being far less
ACID (that is, atomicity, consistency,
isolation, durability) than you might
guess2), the exploration of such techniques for general distributed systems
is just beginning.
This is an exciting time, as the subject of how to make these trade-offs in
distributed systems design is just now
starting to be studied seriously. One
place where this topic is getting the attention it deserves is in the Berkeley Orders of Magnitude (BOOM) team at the
University of California Berkeley, where
multiple researchers are taking different but related approaches to understanding ho w to make disciplined trade-offs.
4 They are breaking new ground in
knowing when and how you can safely
decide not to care about “now” and thus
not pay the costs of coordination. Work
like this may soon lead to a greater understanding of exactly how little we really need synchronized time in order to
do our work. If distributed systems can
be built with less coordination while
still providing all of the safety properties
needed, they may be faster, more resilient, and more able to scale.
Another important area of research
on avoiding the need for worrying
about time or coordination involves
conflict-free replicated data types
(CRDTs). These are data structures
whose updates never need to be or-
dered and so can be used safely with-
out trying to tackle the problem of
time. They provide a kind of safety that
is sometimes called strong eventual
consistency: all hosts in a system that
have received the same set of updates,
regardless of order, will have the same
state. It has long been known that this
can be achieved if all of the work you
do has certain properties, such as be-
ing commutative, associative, and
idempotent. What makes CRDTs ex-
citing is the researchers in that field16
are expanding our understanding of
how expressive we can be within that
limitation and how inexpensively we
can do such work while providing a
rich set of data types that work off the
shelf for developers.
One way to tell the development of
these topics is just beginning is the existence of popular distributed systems
that prefer ad hoc hacks instead of the
best-known choices for dealing with
their problems of consistency, coordination, or consensus. One example of
this is any distributed database with
a “last write wins” policy for resolving
conflicting writes. Since we already
know that “last” by itself is a meaningless term for the same reason that
“now” is not a simple value across the
whole system, this is really a “many
writes, chosen unpredictably, will be
lost” policy—but that would not sell as
many databases, would it? Even if the
state of the art is still rapidly improving, anyone should be able to do better
Another example, just as terrible as
the “most writes lose” database strategy, is the choice to solve cluster management via ad-hoc coordination code
instead of using a formally founded
and well-analyzed consensus protocol.
If you really do need something other
than one of the well-known consensus
protocols to solve the same problem
they solve (hint: you don’t), then at a
very minimum you ought to do what
the ZooKeeper/Zab implementers did
and document your goals and assumptions clearly. That will not save you
from disaster, but it will at least assist
the archaeologists who come later to
examine your remains.
This is a very exciting time to be a
builder of distributed systems. Many
changes are still to come. Some truths,
however, are very likely to remain. The
idea of “now” as a simple value that has
meaning across a distance will always
be problematic. We will continue to
need more research and more practical
experience to improve our tools for living with that reality. We can do better.
We can do it now.
Thanks to those who provided feedback, including Peter Bailis, Christopher Meiklejohn, Steve Vinoski, Kyle
Kingsbury, and Terry Coatta.
If You Have Too Much Data,
then “Good Enough” Is Good Enough
There’s Just No Getting around It: You’re
Building a Distributed System
The Network is Reliable
Peter Bailis and Kyle Kingsbury
1. Aphyr. Call me maybe: ZooKeeper, 2013; http://aphyr.
2. Bailis, P. When is “ACID” ACID? Rarely, 2013; http://
3. Bailis, P. and Kingsbury, K. The network is reliable.
ACM Queue 12, 7 (2014); http://queue.acm.org/detail.
4. BOOM; http://boom.cs.berkeley.edu/papers.html.
5. Brewer, E.A. Towards robust distributed systems,
6. Corbett, J.C. et al. Spanner: Google’s globally
distributed database. In Proceedings of the 10th
Symposium on Operating System Design and
Implementation, 2012; http://research.google.com/
archive/ spanner.html; http://static.googleusercontent.
7. Deutsch, P. Eight fallacies of distributed computing;
8. Fischer, M.J., Lynch, N.A. and Paterson, M.S.
Impossibility of distributed consensus with one
faulty process. JACM 32, 2 (1985), 374–382;
9. Gilbert, S. and Lynch, N. Brewer’s conjecture and the
feasibility of consistent, available, partition-tolerant
Web services. ACM SIGACT News 33, 2 (2002), 51–59;
10. Junqueira, F. P., Reed, B.C. and Serafini, M. Zab: High-performance broadcast for primary backup systems, 2011;
11. Kulkarni, S., Demirbas, M., Madeppa, D., Bharadwaj, A.
and Leone, M. Logical physical clocks and consistent
snapshots in globally distributed databases; http://
12. Lamport, L. The part-time parliament. ACM Trans.
Computer Systems 16, 2 (1998), 133–169; http://
13. Medeiros, A. ZooKeeper’s atomic broadcast
protocol: Theory and practice; http://www.tcs.hut.fi/
14. N TP; http://www.ntp.org.
15. Rotem-Gal-Oz, A. Fallacies of distributed computing
16. SyncFree; https://syncfree.lip6.fr/index.php/
Justin Sheehy (@justinsheehy) is the site leader for
VMware’s Cambridge MA R&D center, and also plays
a strategic role as architect for the company’s Storage
& Availability business. His three most recent positions
before joining VMware were as CTO for Basho, Principal
Scientist at MITRE, and Sr. Architect at Akamai.
Copyright held by owner.
Publication rights licensed to ACM. $15.00