ically updated across entities.
˲ Activities consist of the collection
of state within the entities used to
manage messaging relationships with
a single partner entity.
Workflow to reach decisions functions within activities within entities.
It is the fine-grained nature of workflow that is surprising when looking at
Many applications are implicitly
being designed with both entities
and activities today. They are simply
not formalized, nor are they consistently used. Where the use is inconsistent, bugs are found and eventually
patched. By discussing and consistently using these patterns, better
large-scale applications can be built
and, as an industry, we can get closer
to building solutions that allow busi-ness-logic programmers to concentrate on the business problems rather
than the problems of scale.
A Conversation with Bruce Lindsay
Condos and Clouds
By Pat Helland
Microsoft’s Protocol Documentation
Program: Interoperability Testing at Scale
A discussion with Nico Kicillof, Wolfgang
Grieskamp, and Bob Binder
1. Bernstein, P.A., Hadzilacos, V. and Goodman, N.
Concurrency Control and Recovery in Database
Systems. Addison-Wesley, Boston, MA, 1987.
2. Corbett, J.C. et al. Spanner: Google’s globally
distributed database. In Proceedings of the 10th
Usenix Symposium on Operating Systems Design and
3. Dean, J. and Ghemawat, S. MapReduce: Simplified
data processing on large clusters. In Proceedings of
the 6th Symposium on Operating Systems Design and
4. Gray, J. A census of Tandem Systems availability
between 1985 and 1990. IEEE Transactions on
Reliability 39, 4 (1990), 409–418.
5. Lamport, L. The part-time parliament. ACM Trans.
Computer Systems 16, 2 (1998), 133–169.
6. Ongaro, D. and Ousterhout, J. In search of an
understandable consensus algorithm (extended
version), 2014; https://raft.github.io/raft.pdf.
7. Wachter, H. and Reuter, A. The Con Tract Model.
Database Transaction Models for Advanced
Applications. 0 219-263. Morgan Kaufmann, San
Francisco, CA, 1992, 219–263.
Pat Helland has been implementing transaction systems,
databases, application platforms, distributed systems,
fault-tolerant systems, and messaging systems since
1978. He currently works at Salesforce.
Copyright held by author.
Publication rights licensed to ACM $15.00
order encumbering its items. As it connects items and orders, these will be
organized by item. Each item keeps
information about outstanding orders
for that item. Each activity within the
item (one per order) manages the uncertainty of the order.
When an entity agrees to perform
a tentative operation, it agrees to let
another entity decide the outcome.
It accepts uncertainty, which adds to
its confusion. As confirmations and
cancellations arrive, the uncertainty
decreases, reducing confusion. It is
normal to proceed through life with
ever increasing and decreasing uncertainty as old problems get resolved
and new ones arrive in your lap.
Again, this is simply workflow, but
it is fine-grained workflow with entities
as the participants.
Uncertainty and almost-infinite
scaling. The management of uncertainty usually revolves around two-party agreements. There may be multiple two-party agreements. These are
knit together as a web of fine-grained
two-party agreements using entity keys
as the links and activities to track the
known state of a distant partner.
Consider a house purchase and the relationships with the escrow company.
The buyer enters into an agreement of
trust with the escrow company, as do the
seller, mortgage company, and all other
parties involved in the transaction.
When you go to sign papers to buy a
house, you do not know the outcome of
the deal. You accept that, until escrow
closes, you are uncertain. The only party with control over the decision-mak-ing is the escrow company.
This is a hub-and-spoke collection
of two-party relationships that are used
to get a large set of parties to agree without use of distributed transactions.
When considering almost-infinite scaling, it is interesting to think
about two-party relationships. By
building up from two-party tentative/
cancel/confirm (just like traditional
workflow), you can see the basis for
achieving distributed agreement. Just
as in the escrow company, many entities may participate in an agreement
Because the relationships are
two-party, the simple concept of an
activity as “stuff I remember about
that partner” becomes a basis for
managing enormous systems—even
when the data is stored in entities
and you don’t know where the entity
lives. You must assume it is far away.
Still, it can be programmed in a scale-
Real-world almost-infinite scale applications would love the luxury of a
global transactional scope. Unfortunately, this is not readily available to
most of us without introducing fragility when a system fails.
Instead, the management of the uncertainty of the tentative work is passed
off into the hands of the developer of
the scale-agnostic application. It must
be handled as reserved inventory, allocations against credit lines, and other
As usual, the computer industry is
Today, new design pressures are
being foisted onto programmers who
simply want to solve business problems. Their realities are taking them
into a world of almost-infinite scaling
and forcing them into design problems largely unrelated to the real business at hand. This is as true today, as
it was when this article was first published in 2007.
Unfortunately, programmers striving to solve business goals such as e-commerce, supply-chain-management,
financial, and health-care applications
increasingly need to think about scaling without distributed transactions.
Most developers simply don’t have access to robust systems offering scalable
We are at a juncture where the patterns for building these applications
can be seen, but no one is yet applying
these patterns consistently. This article argues that these nascent patterns
can be applied more consistently in
the handcrafted development of applications designed for almost-infinite scaling.
This article has introduced and
named a couple of formalisms emerging in large-scale applications:
˲ Entities are collections of named
(keyed) data that may be atomically updated within the entity but never atom-