other. This eventual convergence, however, does not provide SSI semantics.
First, the “predictable order” will not
necessarily correspond to an execution
that could have arisen under SSI; eventual consistency does not specify which
value is eventually chosen. Second,
there is an unspecified window before
convergence is reached, during which
the system will not provide SSI semantics, but rather arbitrary values. As we
will illustrate, this promise of eventual
convergence is a rather weak property.
Finally, a system with SSI provides
eventual consistency—the “
eventuality” is immediate—but not vice versa.
Why is eventual consistency useful? Pretend you are in charge of the
data infrastructure at a social network
where users post new status updates
that are sent to their followers’ timelines, represented by separate lists—
one per user. Because of large scale
and frequent server failures, the database of timelines is stored across multiple physical servers. In the event of
a partition between two servers, however, you cannot deliver each update
to all timelines. What should you do?
Should you tell the user he or she cannot post an update, or should you wait
until the partition heals before providing a response? Both of these strategies
choose consistency over availability, at
the cost of user experience.
Instead, what if you propagate the
update to the reachable set of followers’ timelines, return to the user, and
delay delivering the update to the other
followers until the partition heals? In
choosing this option, you give up the
guarantee that all users see the same set
of updates at every point in time (and
admit the possibility of timeline reordering as partitions heal), but you gain
high availability and (arguably) a better
user experience. Moreover, because updates are eventually delivered, all users
eventually see the same timeline with
all of the updates that users posted.
Implementing eventual consistency.
A key benefit of eventual consistency
is that it is fairly straightforward to implement. To ensure convergence, replicas must exchange information with
one another about which writes they
have seen. This information exchange
is often called anti-entropy, a homage
to the process of reversing entropy,
or thermodynamic randomness, in a
the same state;
at some point in
from one another.
physical system. 19 Protocols for achieving anti-entropy take a variety of forms;
one simple solution is to use an asynchronous all-to-all broadcast: when a
replica receives a write to a data item,
it immediately responds to the user,
then, in the background, sends the
write to all other replicas, which in turn
update their locally stored data items.
In the event of concurrent writes to a
given data item, replicas deterministi-cally choose a “winning” value, often
using a simple rule such as “last writer
wins” (for example, via a clock value
embedded in each write). 22
Suppose you want to make a single-node database into an eventually consistent distributed database. When
you get a request, you route it to any
server you can contact. When a server
performs a write to its local key-value
store, it can send the write to all other servers in the cluster. This write-forwarding becomes the anti-entropy
process. Be careful, however, when
sending the write to the other servers. If you wait for other servers to respond before acknowledging the local
write, then, if another server is down
or partitioned from you, the write request will hang indefinitely. Instead,
you should send the request in the
background; anti-entropy should be
an asynchronous process. Implicitly,
the model for eventual consistency
assumes system partitions are eventually healed and updates are eventually
propagated, or that partitioned nodes
eventually die and the system ends up
operating in a single partition.
The eventually consistent system
has some great properties. It does not
require writing difficult “corner-case”
code to deal with complicated scenarios such as downed replicas or network
partitions—anti-entropy will simply
stall—or writing complex code for coordination such as master election. All
operations complete locally, meaning
latency will be bounded. In a geo-repli-cated scenario, with replicas located in
different data centers, you do not have
to endure long-haul wide-area network
latencies on the order of hundreds of
milliseconds on the request fast path.
The mechanism just described, returning immediately on the local write, can
put data durability at risk. An intermediate point in trading between durability and availability is to return after W