leading technique to support scalable
Finally, different applications demand different behaviors from durable
state. Do you want it right or do you
want it right now Human beings usually want an answer right now rather than
right. Many application solutions based
on object identity may be tolerant of
stale versions. Immutable objects can
provide the best of both worlds by being
both right and right now.
Consider your application’s requirements carefully. If you are not careful,
you will have problems with your state
that you will definitely mind.
Mihir Nanavati et al.
Network Applications Are Interactive
Storage Systems: Not Just
a Bunch of Disks Anymore
1. Dean, J. and Barosso, L. A. The tail at scale. Commun.
ACM 56, 2 (Feb. 2013), 74–80.
2. DeCandia, G. et al. Dynamo: Amazon’s highly available
key-value store. In Proceedings of the 21st ACM
SIGOPS Symp. Operating System Principals, 2007,
3. Garcia-Molina, H. and Salem, K. Sagas. In Proceedings
of the ACM SIGMOD Conf. Management of Data,
1987, 249–259; https://www.cs.cornell.edu/andru/
4. Gray, J. and Reuter, A. Transaction Processing: Concepts
and Techniques. Morgan Kaufmann, Burlington, MA,
5. Helland, P. Data on the outside versus data on the
inside. In Proceedings of the Conf. Innovative Database
6. Helland, P. Fiefdoms and emissaries, 2002; download.
7. Helland, P. Idempotence is not a medical condition.
acmqueue 10, 4 (2012), 56–65.
8. Helland, P. Immutability changes everything.
acmqueue 13, 9 (2016); https://queue.acm.org/detail.
9. Helland, P. Life beyond distributed transactions.
acmqueue 14, 5 (2016); https://queue.acm.org/detail.
10. Helland, P. Standing on distributed shoulders of giants.
acmqueue 14, 2 (2016); https://queue.acm.org/detail.
11. Lakshman, A. and Malik, P. Cassandra: A decentralized
structured storage system. ACM SIGOPS Operating
Systems Review 44, 2 (2010), 35–40.
12. von Neumann, J. First draft of a report on the EDVAC.
IEEE Annals of the History of Computing 15, 4 (1993),
Pat Helland has been implementing transaction systems,
databases, application platforms, distributed systems,
fault-tolerant systems, and messaging systems since 1978.
He currently works at Salesforce.
Copyright held by owner/author.
Publication rights licensed to ACM. $15.00.
asynchronously, meaning it is not
surprising to read a new version of the
description, retry, and then get an old
version from a cache replica that’s not
User lookups are very sensitive to latency. Just as shopping cart response
times must be fast, product-catalog
lookups must be fast. It is common for
a client working to display the description of a product to wait for an answer,
time out, and retry to a different replica
if necessary to ensure the latency for the
response is fast.
Note the management of the short
latency depends on the fact that any version of the product-catalog description is
OK. This is another example of the business needing an answer right now more
than it needs the answer to be right.
Search. Say you are building a search
system for the contents of the Web. Web
crawlers feed search indexers. Each
document is given a unique ID. Search
terms are identified for each document.
The index terms are assigned to a shard.
Updates to the index are not super
latency-sensitive. Mostly, changes observed by crawling the Web are not latency-sensitive. Other than time-sensitive
news feeds, the changes need not be immediately visible. When a random document is produced at some remote location in the world, it might take a while to
Search results are, however, sensitive
to latency. In general, a search request
from a user is fed into servers that ask all
of the shards for matching results. This
looks a lot like the product catalog depicted in Figure 9, but the user requests
hit all the shards, not just one of them.
It’s very important that searches
get quick results, or users will get frus-
trated. This is aggravated by the need
to hear back from all the servers. If any
server is a laggard, the response is de-
layed. The mechanism for coping with
this at Google is beautifully described in
the 2013 article “The Tail at Scale.” 1
In search, it is OK to get stale answers, but the latency for the response
must be short. There’s no notion of linearizable reads nor of read-your-writes.
Search clearly needs to return answers
right now even if they are not right.
It’s about the application pattern.
Each application pattern shows different characteristics and trade-offs,
shown in Figure 10.
State means different things. Session
state captures stuff across requests but
not across failures. Durable state remembers stuff across failures.
Increasingly, most scalable computing consists of microservices with
stateless interfaces. Microservices need partitioning, failures, and rolling upgrades,
and this implies that stateful sessions
are problematic. Microservices may call
other microservices to read data or get
Transactions across stateless calls are
usually not supported in microservice
solutions. Microservices and their load-balanced service pools make server-side
session state difficult, which, in turn,
makes it difficult to have transactions
across calls and objects. Without transactions, coordinated changes across
objects in durable state need to use the
careful replacement technique in which
updates are ordered, confirmed, and
idempotent. This is challenging to program but is a natural consequence of microservices, which have emerged as the
Figure 10. Application pattern trade-offs.
ment with k/v
No No Yes Works across
Yes Yes Immutable Non-linearizable
Yes Yes No Sometimes give
Yes No No Scalable cache means
that stale is ok
Search Yes No No Scalable cache