Eventual Consistency Today:
Limitations, Extensions, and Beyond
Peter Bailis, Ali Ghodsi
Structured Deferral: Synchronization via
Paul E. McKenney
1. Brewer, E. Towards robust distributed systems. In
Proceedings of the 19th Annual ACM Symposium on
Principles of Distributed Computing, (2000)
2. Bronson, N., et al. TAO: Facebook’s distributed data
store for the social graph. In Proceedings of the
Usenix Annual Technical Conference, (2013).
3. Cham, J. PhD Comics (June 2013); http://www.
4. Corbett, J.C. et al. Spanner: Google’s globally
distributed database. ACM Transactions on Computer
31, 3 (2013).
5. DeCandia, G., et al. Dynamo: Amazon’s highly available
key-value store. In Proceedings of the 21st ACM
Symposium on OS Principles, (2007), 205–220.
6. Gibson, G., Grider, G., Jacobson, A. and Lloyd, W.
PRObE: A thousand-node experimental cluster for
computer systems research. Usenix ;login: 38, 3 (2013).
7. Gilbert, S. and Lynch, N. Brewer’s conjecture and the
feasibility of consistent, available, partition-tolerant
Web services. ACM SIGACT News
33, 2 (2002), 51–59.
8. Lakshman, A. and Malik, P. Cassandra—a
decentralized structured storage system. In The 3rd
ACM SIGOPS International Workshop on Large-scale
Distributed Systems and Middleware, (2009).
9. Lamport, L. Time, clocks, and the ordering of events in
a distributed system. Commun. ACM
21, 7 (July 1978),
10. Lipton, R.J. and Sandberg, J.S. PRAM: A scalable
shared memory. Technical Report TR-180-88.
Princeton University, Department of Computer
11. Lloyd, W., Freedman, M. J. and Kaminsky, M. and
Andersen, D.G. Don’t settle for eventual: Scalable
causal consistency for wide-area storage with COPS.
In Proceedings of the 23rd Symposium on Operating
Systems Principles: (2011), 401–416.
12. Lloyd, W., Freedman, M.J., Kaminsky, M. and
Andersen, D.G. Stronger semantics for low-latency
geo-replicated storage. In Proceedings of the 10th
Usenix Conference on Networked Systems Design and
Implementation (2013), 313–328.
13. Lloyd, W. Stronger consistency and semantics for
low-latency geo-replicated storage. Ph.D. Dissertation,
2013, Princeton University.
14. Santora, M. In hours, thieves took $45 million in ATM
scheme. New York Times (May 9, 2013).
15. Sovran, Y., Power, R., Aguilera, M. K. and Li, J.
Transactional storage for geo-replicated systems.
In Proceedings of the 23rd Symposium on Operating
Systems Principles, (2011) 385–400.
16. Voldemort. 2013; http://project-voldemort.com.
Wyatt Lloyd is a postdoctoral researcher at Facebook
and is to begin a position as an assistant professor at the
University of Southern California. His research interests
include the distributed systems and networking problems
that underlie the architecture of large-scale websites,
cloud computing, and big data.
Michael J. Freedman is an associate professor of
computer science at Princeton University, with a research
focus on distributed systems, networking, and security.
Michael Kaminsky is a senior research scientist at Intel
Labs and is an adjunct faculty member of the computer
science department at Carnegie Mellon University. He is
part of the Intel Science and Technology Center for Cloud
Computing based in Pittsburgh, PA.
David G. Andersen is an associate professor of
computer science at Carnegie Mellon University. In 1995,
he cofounded an Internet service provider in Salt Lake
Copyright held by Owner(s)/Author(s). Publication rights
licensed to ACM. $15.00.
because all reads ask for a current
value or an old value. It is low latency
because it requires at most two nonblocking rounds of parallel reads. It
is partition tolerant because all reads
are in the local datacenter (partitions
are assumed to occur only in the wide
area, not in the local datacenter). It is
scalable because it is fully decentralized. Finally, it is performant because
it normally takes only a single round of
parallel reads and only two rounds of
reads in the worst case.
Our previous work on Eiger12 has
more details on how to choose the effective time, how to limit server-side
storage of old versions, and an algorithm for write-only transactions that
also maintains all the ALPS properties.
The cost of causal consistency
and limited transactions. For one re-
alistic view of Eiger’s overhead, we
parameterized a synthetic workload
based upon Facebook’s production
The Associations and Objects (TAO)
2 We then compared Eiger’s
throughput with that of eventually
consistency Cassandra from which
it was forked in an experiment with
clusters of eight servers each in Wash-
ington and California. The Cassandra
setup achieved 23,657 operations per
second that touched 498,239 data lo-
cations per second on average. The
Eiger setup, with causal consistency
and all inconsistent batch operations
converted to read or write transac-
tions, achieved 22,891 operations per
second that touched 480,904 data lo-
cations per second on average. This
experiment shows that for this real-
world workload Eiger’s causal consis-
tency and stronger semantics do not
impose significant overhead.
To demonstrate the scalability
of Eiger, we ran the Facebook TAO
workload on N client machines that
fully loaded an N-server cluster that is
replicating writes to another N-server
cluster (that is, the N=128 experiment
involves 384 machines). This experi-
ment was run on PRObE’s Kodiak tes-
6 which provides an Emulab with
exclusive access to hundreds of ma-
chines. Figure 12 shows the through-
put for Eiger as N scales from eight
to 128 servers/cluster. The bars show
throughput normalized against the
throughput of the eight-server clus-
ter. Eiger scales out as the number of
servers increases; each doubling of
the number of servers increases clus-
ter throughput by 96% on average.
More information is available in our
papers on COPS11 and Eiger,
12 and Wyatt Lloyd’s dissertation.
13 The code for
Eiger is available from https://github.
This work was supported by funding
from National Science Foundation
Awards CSR-0953197 (CAREER), CCF-
0964474, CNS-1042537 (PRObE), and
CNS-1042543 (PRObE); and by Intel via
the Intel Science and Technology Center for Cloud Computing (ISTC-CC).
Proving the Correctness of Nonblocking
Figure 12. Eiger scales linearly.