can no longer all communicate with
7. A disaster. The local cluster is
wiped out by a flood, earthquake, etc.
The cluster no longer exists.
8. A network failure in the WAN connecting the clusters together. The WAN
failed and clusters can no longer all
communicate with each other.
First, note that errors 1 and 2 will
cause problems with any high availability scheme. In these two scenarios,
there is no way to keep going; i.e., availability is impossible to achieve. Also,
replica consistency is meaningless; the
current DBMS state is simply wrong.
Error 7 will only be recoverable if a local transaction is only committed after
the assurance that the transaction has
been received by another WAN-con-nected cluster. Few application builders are willing to accept this kind of
latency. Hence, eventual consistency
cannot be guaranteed, because a transaction may be completely lost if a disaster occurs at a local cluster before the
transaction has been successfully forwarded elsewhere. Put differently, the
application designer chooses to suffer data loss when a rare event occurs,
because the performance penalty for
avoiding it is too high.
As such, errors 1, 2, and 7 are examples of cases for which the CAP theorem
simply does not apply. Any real system
must be prepared to deal with recovery
in these cases. The CAP theorem cannot be appealed to for guidance.
Let us now turn to cases where the
CAP theorem might apply. Consider
error 6 where a LAN partitions. In my
experience, this is exceedingly rare,
especially if one replicates the LAN (as
Tandem did). Considering local failures ( 3, 4, 5, and 6), the overwhelming
majority cause a single node to fail,
which is a degenerate case of a network partition that is easily survived by
lots of algorithms. Hence, in my opinion, one is much better off giving up P
rather than sacrificing C. (In a LAN environment, I think one should choose
CA rather than AP.) Newer SQL OLTP
systems appear to do exactly this.
Next, consider error 8, a partition
in a WAN network. There is enough
redundancy engineered into today’s
WANs that a partition is quite rare. My
experience is that local failures and
application errors are way more likely.
“in the noSQL
the caP theorem
has been used
as the justification
for giving up
Moreover, the most likely WAN failure is to separate a small portion of
the network from the majority. In this
case, the majority can continue with
straightforward algorithms, and only
the small portion must block. Hence, it
seems unwise to give up consistency all
the time in exchange for availability of
a small subset of the nodes in a fairly
Lastly, consider a slowdown either
in the OS, the DBMS, or the network
manager. This may be caused by a skew
in load, buffer pool issues, or innumerable other reasons. The only decision one can make in these scenarios
is to “fail” the offending component;
i.e., turn the slow response time into a
failure of one of the cases mentioned
earlier. In my opinion, this is almost
always a bad thing to do. One simply
pushes the problem somewhere else
and adds a noticeable processing load
to deal with the subsequent recovery.
Also, such problems invariably occur
under a heavy load—dealing with this
by subtracting hardware is going in the
Obviously, one should write software
that can deal with load spikes without
failing; for example, by shedding load
or operating in a degraded mode. Also,
good monitoring software will help
identify such problems early, since the
real solution is to add more capacity.
Lastly, self-reconfiguring software that
can absorb additional resources quickly is obviously a good idea.
In summary, one should not throw
out the C so quickly, since there are
real error scenarios where CAP does
not apply and it seems like a bad trade-off in many of the other situations.
1. eric brewer, “towards robust distributed systems,”
2. Jim gray, “why do computers stop and what can be
done about it,” tandem computers technical report
85. 7, cupertino, ca, 1985. http://www.hpl.hp.com/
techreports/tandem/tr- 85. 7.pdf
disclosure: Michael stonebraker is associated with four
startups that are producers or consumers of database
“Degenerate network partitions” is a very
good point—in practice I have found that
most network partitions in the real world
are of this class.
I like to term certain classes of network
partitions “trivial.” If there are no clients
in the partitioned region, or if there are
servers in the partitioned region, it is then
trivial. So it could involve more than one
machine, but it is then readily handled.
I think a lot of the discussion about
distributed database semantics, much like
a lot of the discussion about SQL vs. NoSQL,
has been somewhat clouded by a shortage
of pragmatism. So an analysis of the
CAP theorem in terms of actual practical
situations is a welcome change :-)
Michael Stonebraker is an adjunct professor at the
Massachusetts institute of technology.
© 2010 acM 0001-0782/10/1000 $10.00