Even today, 10 years after this paper
was first written, real system developers rarely try to achieve strongly consistent transactions over more than just a
few computers. Instead, they assume
multiple separate transaction scopes.
Each computer is a separate scope with
local transactions inside.
Most applications use at-least-once
messaging. TCP/IP is great if you are a
short-lived Unix-style process, but consider the dilemma faced by an application developer whose job is to process
a message and modify some durable
data represented in a database. The
message is consumed and not yet acknowledged. The database is updated
and then the message is acknowledged. In a failure, this is restarted and
the message is processed again.
The dilemma derives from the fact
that the message delivery is not directly
coupled to the update of the durable
data other than through application
action. While it is possible to couple
the consumption of messages to the
update of the durable data, this is not
commonly available. The absence of
this coupling leads to failure windows
where the message is delivered more
than once. Rather than lose messages,
the message plumbing delivers them
at least once.
A consequence of this behavior is
the application must tolerate message
retries and out-of-order delivery.
Opinions to be Justified
The nice thing about writing an opinion piece is that you can express wild
opinions. Here are a few that this article tries to justify.
Scalable apps use uniquely identified entities. This article argues that
the upper-layer code for each application must manipulate a single collection of data called an entity. There are
no restrictions on the size of a single
entity except that it must live within
a single transactional scope (that is,
Each entity has a unique identifier
or key, as shown in Figure 2. An entity
key may be of any shape, form, or flavor.
Somehow, it must uniquely identify exactly one entity and the data it contains.
There are no restrictions on the
representations of the entity. It may
be represented as SQL records, XML,
JSON, files, or anything else. One pos-
plies using a new abstraction called an
entity as you write your program. An en-
tity lives on a single machine at a time,
and the application can manipulate only
one entity at a time. A consequence of
almost-infinite scaling is that this pro-
grammatic abstraction must be exposed
to the developer of the business logic.
By naming and discussing this as-yet-unnamed concept, we can perhaps
agree on a consistent programmatic approach and a consistent understanding
of the issues involved in building scalable systems.
Furthermore, the use of entities
has implications on the messaging
patterns used to connect them. This
leads to the creation of state machines
that cope with the message-delivery
inconsistencies foisted upon innocent application developers attempting to build scalable solutions to business problems.
Consider the following three assumptions, which are asserted and not justified. Assume these are true based on
Layers of the application and scale
agnosticism. Let’s begin by presum-
ing that each scalable application has
at least two layers, as shown in Figure
1. These layers differ in the perception
of scaling. They may have other dif-
ferences, but these are not relevant to
The lower layer of the application
understands that more computers get
added to make the system scale. In ad-
dition to other work, it manages the
mapping of the upper layer’s code to
the physical machines and their loca-
tions. The lower layer is scale-aware
in that it understands this mapping. I
presume that the lower layer provides a
scale-agnostic programming abstraction
to the upper layer. There are many ex-
amples of scale-agnostic programming
abstractions, including MapReduce. 3
Using this scale-agnostic program-
ming abstraction, the upper layer of
the application code is written with-
out worrying about scaling issues. By
sticking to the scale-agnostic program-
matic abstraction, you can write appli-
cation code that is not worried about
the changes happening when it is de-
ployed over an ever-increasing load.
Over time, the lower layer of these ap-
plications may evolve to become a new
platform or middleware that simpli-
fies the creation of scale-agnostic APIs.
Transactional scopes. Lots of aca-
demic work has been done on the no-
tion of providing strongly consistent
transactions over distributed systems.
This includes 2PC (two-phase com-
mit), 1 Paxos, 5 and recently Raft. 6 Clas-
sic 2PC will block when a machine
fails unless the coordinator and par-
ticipants in the transaction are fault
tolerant in their own right such as the
Tandem NonStop System. Paxos and
Raft do not block with node failures
but do extra work coordinating much
like Tandem’s system.
These algorithms can be described
as providing strongly consistent transac-
tions over distributed systems. Their goal
is to allow arbitrary atomic updates
to data spread over many machines.
Updates exist in a single transactional
scope spanning many machines.
Unfortunately, in many circumstances this is not an option for an
application developer. Applications
may need to span trust boundaries,
different platforms, and different operational and deployment zones. What
happens when you “just say no” to distributed transactions?
Figure 1. Two-layered application.
Figure 2. Data for an application comprises
key = “ABC” entity
key = “WPB” entity
key = “QLA” entity
key = “UNB” entity