rent updates have occurred, with histories {a1} and {b1}, as neither a1 →
b1 nor b1 → a1. In this case, a new version that merges the two updates is
created (merge is denoted by the join
symbol ), which requires creating a
new unique name, leading to {a1, b1,
b2}. When the state of replica b is later
propagated to replica c, as no concurrent update exists in replica c, no new
version is created.
Again, vectors can compact the
representation. The result, known as
a version vector, was created in 1983,
12
five years before vector clocks. Figure 5 presents the same example as
before, represented with version vectors.
In some cases when the state of
one replica is propagated to another
replica, the two versions are kept by
the system as conflicting versions. For
example, in Figure 6, when the message from node A is received in node
B, the system keeps each causal history {a1} and {b1} associated with the
respective version. The causal history
associated with the node containing
both versions is {a1, b1}, the union of
the causal history of all versions. This
approach allows later checking for
causality relations between each version and other versions when merging the states of additional nodes.
The conflicting versions could also be
merged, creating a new unique name,
as in the example.
One limitation of causality tracking
by vectors is that one entry is needed for
each source of concurrency.
4 You can
expect a difference of several orders
of magnitude between the number of
nodes in a datacenter and the number
of clients they handle. Vectors with one
entry per client do not scale well when
millions of clients are accessing the
service.
7 Again, a look at the foundation of causal histories shows how to
overcome this limitation.
The basic requirement in causal
histories is each event be assigned
a unique identifier. There is no re-
quirement this unique identifier be
created locally or immediately. Thus,
in systems where nodes can be di-
vided into clients and servers and
where clients communicate only with
servers, it is possible both to delay
the creation of a new unique name
until the client communicates with
register only those events that change
replicas. In this case, when think-
ing about causal histories, you need
only to assign a new unique name to
these relevant events. Still, you need
to propagate the causal histories
when messages are propagated from
one site to another and the remaining
rules for comparing causal histories
remain unchanged.
Figure 4 presents the same example as before, but now with events that
are not registered for causality tracking denoted with . If the run represents the updates to replicas of a data
object, then after nodes A and B are
concurrently modified, the state of
replica a is sent to replica b (in a message). When the message is received
in node B, it is detected two concur-
Figure 6. Causal histories with versions not immediately merged.
node A
node B
node C
time
{ } { }
{a1}
{b1}
{a1}
{a1},{b1}
{a1, a2}
{a1, b1, b2} {a1, b1, b2}
{a1, b1, b2}
Figure 7. Causal histories in a distributed storage system.
client B
client A
Server S
Server T
time
{} {}
put
{ } { }
{} {}
put
{s1}
{s1},{s2}
{t1, t2} {t1, t2, t3} {t1, t2, t3},{s1}
Figure 8. Dotted version vectors in distributed storage system.
client B
client A
Server S
Server T
time
[0,0]
[0,0]
put
[0,0] [0,0]
[0,0]
put
[0,0]s1 [0,0]s1,[0,0]s2
[0, 2]t3 [0, 2]t2,[0,0]s1 [0, 1]t2