practice
When a system
processes trillions
and trillions of
requests, events
that normally have
a low probability
of occurrence
are now guaranteed
to happen and
must be accounted
for upfront in
the design and
architecture
of the system.

reading from the promoted backup will produce old, inconsistent values. Also to support better scalable read performance, RDBMSs have started to provide the ability to read from the backup, which is a classical case of providing eventual consistency guarantees in which the inconsistency windows depend on the periodicity of the log shipping.

On the server side we need to take a deeper look at how updates flow through the system to understand what drives the different modes that the developer who uses the system can experience. Let’s establish a few definitions before getting started:

N = The number of nodes that store replicas of the data.

W = The number of replicas that need to acknowledge the receipt of the update before the update completes.

R = The number of replicas that are contacted when a data object is accessed through a read operation.

If W+R > N, then the write set and the read set always overlap and one can guarantee strong consistency. In the primary-backup RDBMS scenario, which implements synchronous replication, N= 2, W= 2, and R= 1. No matter from which replica the client reads, it will always get a consistent answer. In the asynchronous replication case with reading from the backup enabled, N= 2, W= 1, and R= 1. In this case R+W=N, and consistency cannot be guaranteed.

The problems with these configurations, which are basic quorum protocols, is that when because of failures the system cannot write to W nodes, the write operation has to fail, marking the unavailability of the system. With N= 3 and W= 3 and only two nodes available, the system will have to fail the write.

In distributed storage systems that

provide high performance and high availability the number of replicas is in general higher than two. Systems that focus solely on fault tolerance often use N= 3 (with W= 2 and R= 2 configurations). Systems that must serve very high read loads often replicate their data beyond what is required for fault tolerance; N can be tens or even hundreds of nodes, with R configured to 1 such that a single read will return a result. Systems that are concerned with consistency are set to W=N for updates, which may decrease the probability of the write succeeding. A common configuration for these systems that are concerned about fault tolerance but not consistency is to run with W= 1 to get minimal durability of the update and then rely on a lazy (epidemic) technique to update the other replicas.

How to configure N, W, and R depends on what the common case is and which performance path needs to be optimized. In R= 1 and N=W we optimize for the read case, and in W= 1 and R=N we optimize for a very fast write. Of course in the latter case, durability is not guaranteed in the presence of failures, and if W < (N+ 1)/2, there is the possibility of conflicting writes when the write sets do not overlap.

Weak/eventual consistency arises when W+R <= N, meaning that there is a possibility that the read and write set will not overlap. If this is a deliberate configuration and not based on a failure case, then it hardly makes sense to set R to anything but 1. This happens in two very common cases: the first is the massive replication for read scaling mentioned earlier; the second is where data access is more complicated. In a simple key-value model it is easy to compare versions to determine the latest value written to the system, but in

References:

Archives