ternate index using the primary index.
That takes you to the same transactional scope. If you start without the primary index and have to search all of the
transactional scopes, each alternate
index lookup must examine an almost-infinite number of scopes as it looks
for the match to the alternate key. This
will eventually become untenable.
The only logical alternative is to do
a two-step lookup. First, look up the alternate key, which yields the entity key.
Second, access the entity using the entity key. This is very much like inside a
relational database as it uses two steps
to access a record via an alternate key.
But the premise of almost-infinite scaling means the two indices (primary
and alternate) cannot be known to reside in the same transactional scope
(see Figure 4).
The scale-agnostic application program
can’t atomically update an entity and its
alternate index. The upper-layer scale-agnostic application must be designed
to understand that alternate indices
may be out of sync with the entity accessed with its primary index (that is,
entity key). As shown in Figure 4, different keys (primary entity key versus
alternate keys) cannot be collocated or
What in the past has been managed
automatically as alternate indices must
now be managed manually by the application. Workflow-style updates via
asynchronous messaging are all that
are left. When you read data from alternate indices, you must understand
that it is potentially out of sync with the
entity itself. Alternate indices are now
harder. This is a fact of life in the big
cruel world of huge systems.
Messaging Across Entities
This section considers connecting independent entities using messages. It
examines naming, transactions and
messages, message-delivery semantics, and the impact of repartitioning
Messages to communicate across
entities. If you can’t update the data
across two entities in the same transaction, you need a mechanism to update
the data in different transactions. The
connection between the entities is via
Atomic transactions and entities. In
scalable systems, you can’t assume transactions for updates across these different
entities. Each entity has a unique key,
and each entity is easily placed into one
transactional scope. Recall the premise
that almost-infinite scaling causes the
number of entities inexorably to increase,
but size of the individual entity remains
small enough to fit in a transactional
scope (that is, one computer).
How can you know that two separate
entities are guaranteed to be within the
same transactional scope and, hence,
atomically updatable? You know only
when a single unique key unifies both.
Now it is really one entity!
If hashing is used for partitioning
by entity key, there’s no telling when
two entities with different keys land on
the same box. If key-range partitioning
is used for the entity keys, most of the
time the adjacent key values reside on
the same machine. Once in a while you
will get unlucky and your neighbor will
be on another machine.
A simple test case that counts on atomicity with a neighbor in a key-range
partitioning will usually succeed. Later,
when redeployment moves the entities
across machines the latent bug emerges; the updates are no longer atomic.
You can never count on different entity-key values residing in the same place.
Put more simply, the lower layer of
the application will ensure that each
entity key (and its entity) resides on a
single machine. Different entities may
A scale-agnostic programming ab-
straction must have the notion of entity
as the boundary of atomicity. Under-
standing entities, the use of the entity
key, and the clear commitment to a
lack of atomicity across entities are es-
sential to scale-agnostic programming.
Large-scale applications implicitly
do this in the industry today; there
just isn’t a name for the concept of
an entity. From an upper-layer app’s
perspective, it must assume that the
entity is the scope of transactions. As-
suming more will break when the de-
Consider alternate indices. We are ac-
customed to the ability to address data
with multiple keys or indices. For exam-
ple, sometimes a customer is referenced
by Social Security number, sometimes
by credit-card number, and sometimes
by street address. Assuming extreme
amounts of scaling, these indices can-
not reside on the same machine or in
a single large cluster. The data about a
single customer cannot be known to reside
within a single transactional scope. The
entity itself resides in a single transac-
tional scope. The challenge is the copies
of the information used to create an al-
ternate index must be assumed to reside
in a different transactional scope.
Consider guaranteeing the alternate
index resides in the same transactional
scope. When almost-infinite scaling
kicks in, the set of entities is smeared
across gigantic numbers of machines.
The primary index and alternate index
information must reside within the
same transactional scope. The only
way to ensure this is to locate the al-
Figure 3. Entities spread across different transactional scopes.