ing a database for queries on stock
trades. Once completed, trades cannot
change, so any answers given that are
based solely on the immutable
historical data will remain true. However, if
your database keeps track of the value
of the latest trade, then new information—such new stock prices—might
retract old information, as new stock
prices overwrite the latest ones in the
database. Without coordination between replica copies, the second database may return inconsistent data.
By analyzing programs for monotonicity, you can “bless” monotonic
programs as “safe” under eventual
consistency and encourage the use of
coordination protocols (such as strong
consistency) in the presence of non-monotonicity. As a general rule, operations such as initializing variables, accumulating set members, and testing a
threshold condition are monotonic. In
contrast, operations such as variable
overwrites, set deletion, counter resets,
and negation (such as, “there does not
exist a trade such that…”) are generally
not logically monotonic.
CALM captures a wide space of design
patterns sometimes referred to as ACID
2.0 (associativity, commutativity, idempotence, and distributed). 13
Associativity means you can apply a function
in any order: f(a,f(b,c)) = f(f(a,b),c).
Commutativity means a function’s
arguments are order-insensitive:
f(a,b) = f(b,a). Commutative and associative programs are order-insensitive
and can tolerate message reordering,
as in eventual consistency. Idempotence
means you can call a function on the same
input any number of times and get the
same result: f(f(x))=f(x) (for example,
max( 42, max( 42, 42)) = 42). Idempotence allows the use of at-least-once
message delivery, instead of at-most-once delivery (which is more expensive
to guarantee). Distributed is primarily
a placeholder for D in the acronym (!)
but symbolizes the fact that ACID 2.0
is all about distributed systems. Carefully applying these design patterns can
achieve logical monotonicity.
Recent work on CRDTs (
commutative, replicated data types) embodies
CALM and ACID 2.0 principles within
a variety of standard data types, providing provably eventually consistent data
structures including sets, graphs, and
sequences. 20 Any program that correct-
the complexities
of operating
a strongly
consistent service
at scale
may outweigh
the benefit of, say,
preventing
an off-by-one error
in Justin Bieber’s
follower count
on twitter.
ly uses these predefined, well-specified
data structures is guaranteed to never
see any safety violations.
To understand CRDTs, consider
building an increment-only counter
that is replicated on two servers. We
might implement the increment operation by first reading the counter’s
value on one replica, incrementing
the value by one, and writing the new
value back every replica. If the counter is initially 0 and two different users simultaneously initiate increment
operations at two separate, then both
users may then read 0 and then distribute the value 1 to the replicas;
the counter ends up with a value of 1
instead of the correct value of 2. Instead, we can use a G-counter CRDT,
which relies on the fact that increment
is a commutative operation—it does
not matter in what order the two
increment operations are applied, as
long as they are both eventually applied at all sites. With a G-counter, the
current counter status is represented
as the count of distinct increment invocations, similar to how counting is
introduced at the grade-school level:
by making a tally mark for every increment then summing the total. In our
example, instead of reading and writing counter values, each invocation
distributes an increment operation.
All replicas end up with two increment operations, which sum to the
correct value of 2. This works because
replicas understand the semantics of
increment operations instead of providing general-purpose read/write operations, which are not commutative.
A key property of these advances is
that they separate data store and application-level consistency concerns.
While the underlying store may return
inconsistent data at the level of reads
and writes, CALM, ACID 2.0, and CRDT
appeal to higher-level consistency criteria, typically in the form of application-level invariants that the application maintains. Instead of requiring
that every read and write to and from
the data store is strongly consistent,
the application simply has to ensure
a semantic guarantee (say, “the counter is strictly increasing”)—granting
considerable leeway in how reads and
writes are processed. This distinction
between application-level and read/
write consistency is often ambiguous