sible representation would be a collection of SQL records, potentially across
many tables, whose primary key has
the entity key as its prefix.
Entities represent disjoint sets of
data. Each datum resides in exactly
An application consists of many entities. For example, an order-process-ing application encapsulates many
orders, each of which is identified by
a unique Order-ID. To be a scalable or-der-processing application, data from
one order must be disjoint from data
for other orders.
Atomic transactions cannot span
entities. Each computer is assumed to
be a separate transactional scope. Later this article presents the argument
that atomic transactions cannot span
entities. The programmer must always
stick to the data contained inside a single entity for each transaction.
From the programmer’s perspective, the uniquely identified entity is the
transactional scope. This concept has
a powerful impact on the behavior of
applications designed for scaling. One
implication to be explored is that alternate indices cannot be kept transactionally consistent when designing for
Messages are addressed to entities.
Most messaging systems do not consider the partitioning key for the data but
rather target a queue that is consumed
by a stateless process. Standard practice is to include some data in the message that informs the stateless application code where to get the data it needs.
This is the entity key. The data for the
entity is fetched from some database or
other durable store by the application.
A couple of interesting trends are
happening. First, the size of the set of
entities is growing larger than will fit
on a single computer. Each individual
entity usually fits in one computer, but
the set of them does not. Increasingly,
the stateless application is routing to
fetch the entity based on some partitioning scheme.
Second, the fetching and partitioning scheme is being separated into the
lower layers of the application. This is
deliberately isolated from the upper layers responsible for the business logic.
This pattern effectively targets the
entity by routing using the entity key.
Both the stateless Unix-style process
and the lower layers of the application
are simply part of the implementation
of the scale-agnostic API provided for
the business logic. The upper-layer
scale-agnostic business logic simply
addresses the message to the entity key
that identifies the durable state known
as the entity.
Entities manage per-partner state
(activities). Scale-agnostic messaging
is effectively entity-to-entity messaging.
The sending entity is manifest by its durable state and is identified by its entity
key. It sends a message to another entity and identifies it by its entity key. The
recipient entity consists of both scale-agnostic upper-layer business logic and
the durable data representing its state.
This is identified by its entity key.
Recall the assumption that messages are delivered at least once. The
recipient entity may be assailed with
redundant messages that must be ignored. In practice, messages fall into
two categories: those that affect the
state of the entity and those that do
not. Messages that don’t affect the entities state are easy—they are trivially
idempotent. Messages that change the
state require more care.
To ensure idempotence (that is,
guarantee that the processing of retried
messages is harmless), the recipient
entity is typically designed to remember that the message has been processed. Once it has been successfully
processed, repeated messages will typically generate another response matching the behavior of the first message.
The knowledge of the received message creates state that is wrapped up
on a per-partner basis. The important
observation is that the state is organized on a per-partner basis and each
partner is an entity.
The term activity is applied to the
state that manages the per-partner
messaging on each side of a two-party
relationship. Each activity lives in exactly one entity. An entity will have an
activity for each partner entity.
In addition to managing messaging
melees, activities are used to manage
loosely coupled agreements. In a world
where atomic transactions are not a
possibility, tentative operations are
used to negotiate a shared outcome.
These are performed between entities
and are managed by activities.
Building workflows to reach agree-
ment is fraught with challenges that are
well documented elsewhere. 7 This ar-
ticle does not assert that activities solve
these challenges, but rather that they
give a foundation for storing the state
needed to solve them. Almost-infinite
scaling leads to surprisingly fine-grained
workflow-style solutions. The partici-
pants are entities, and each entity man-
ages its workflow using specific knowl-
edge about the other entities involved.
That two-party knowledge maintained
inside an entity is called an activity.
Examples of activities are sometimes subtle. An order application
sends messages to the shipping application. It includes the shipping ID and
the sending order ID. The message
type may be used to stimulate the state
changes in the shipping application
to record that the specified order is
ready to ship. Frequently, implementers don’t design for retries until a bug
appears. Rarely, but occasionally, the
application designers think about and
plan for activities.
Disjoint transactional scopes. Each
entity is defined as a collection of data
with a unique key known to live within
a single transactional scope. Atomic
transactions may always be done
within a single entity.
Uniquely keyed entities. Code for
the upper layer of an application is naturally designed around collections of
data with a unique key. Customer IDs,
Social Security numbers, product SKUs,
and other unique identifiers can be
seen within applications. They are used
as keys to locate the applications’ data.
Guarantees of transactional atomicity
come only within an entity identified by
a unique key.
Repartitioning and entities. One of
the assumptions previously stated is
that the emerging upper layer is scale
agnostic and the lower layer decides
how the deployment evolves as its scale
changes. The location of a specific entity is likely to change as the deployment
evolves. The upper layer of the application cannot make assumptions about
the location of the entity because that
would not be scale agnostic.
As shown in Figure 3, entities are
spread across transactional scopes
using either hashing or key-range