merges the remote causal history {a1,
a2} with the local history {b1} and the
new unique name b2, leading to {a1,
a2, b1, b2}.
Checking causality between two
events x and y can be tested simply by
set inclusion: x → y iff Hx Hy. This
follows from the definition of causal
histories, where the causal history of
an event will be included in the causal
history of the following event. Even
better, marking the last local event
added to the history (distinguished in
bold in the figure) allows the use of a
simpler test: x → y iff x ∈ Hy (for example, a1 → b2, since a1 ∈ {a1, a2, b1, b2}).
This follows from the fact a causal
history includes all events that (
causally) precede a given event.
Causality Tracking
It should be obvious by now that causal histories work but are not very compact. This problem can be addressed
by relying on the following observation: the mechanism of building the
causal history implies if an event b3
is present in Hy, then all preceding
events from that same node, b1 and b2,
are also present in Hy. Thus, it suffices
to store the most recent event from
each node. Causal history {a1, a2, b1,
b2, b3, c1, c2, c3} is compacted to {a
2, b 3, c 3} or simply a vector [ 2,
3, 3].
Now the rules used with causal
histories can be translated to the new
compact vector representation.
Verifying that x → y requires checking if Hx Hy. This can be done, verifying for each node, if the unique
names contained in Hx are also contained in Hy and there is at least one
unique name in Hy that is not contained in Hx. This is immediately
translated to checking if each entry in
the vector of x is smaller or equal to
the corresponding entry in the vector
of y and one is strictly smaller (such
as, ∀i : Vx[i] ≤ Vy [i] and ∃j : Vx[j] < Vy [j]).
This can be stated more compactly as
x → y iff Vx < Vy.
For a new event the creation of a
new unique name is equivalent to
incrementing the entry in the vector
for the node where the event is created. For example, the second event in
node C has vector [0, 0, 2], which corresponds to the creation of event c2 of
the causal history.
Finally, creating the union of the
two causal histories Hx and Hy is
equivalent to taking the pointwise
maximum of the corresponding two
vectors Vx and Vy (such as, ∀i : V [i] =
max(Vx[i], Vy [i])). Logic tells us that,
for the unique names generated in
each node, only the one with the largest counter needs to be kept.
When a message is received, in addition to merging the causal histories,
a new event is created. The vector representation of these steps can be seen,
for example, when the first message
from a is received in b, where taking
the pointwise maximum leads to [ 2,
1, 0] and the new unique name finally
leads to [ 2, 2, 0], as shown in Figure 3.
This compact representation,
known as a vector clock, was introduced around 1988.5, 10 Vector comparison is an immediate translation
of set inclusion of causal histories.
This equivalence is often forgotten in
modern descriptions of vector clocks
and can turn what is a simple encoding problem into an unnecessarily
complex and arcane set of rules, going against logic.
As shown thus far, when using
causal histories, knowing the last
event could simplify comparison by
simply checking if the last event is
included in the causal history. This
can still be done with vectors, if you
keep track of the node in which the
last event has been created. For example, when questioning if x = [ 2, 0,
0] → y = [ 2, 3, 0], with boldface indicating the last event in each vector,
you can simply test if x[0] ≤ y[0] (
2 ≤ 2)
since you have marked the last event
in x was created in node A (that is, it
corresponds to the first entry of the
vector). Since marking numbers in
bold is not a practical implementation, however, the last event is usually stored outside the vector (and is
sometimes called a dot): for example,
[ 2,
2, 0] can be represented as [ 2, 1, 0]
b2. Notice that now the vector represents the causal past of b2, excluding
the event itself.
In an important class of applications there is no need to register causality for all the events in a distributed
computation. For example, to modify
replicas of data, it often suffices to
Figure 4. Causal histories with only some relevant events.
node A
node B
node C
time
{ } { }
{a1}
{b1}
{a1} {a1, a2}
{a1, b1, b2}
{a1, b1, b2}
{a1, b1, b2}
Figure 5. Version vectors with only some relevant events.
node A
node B
node C
time
[
1,0,0] [
1,0,0] [
2,0,0]
[0,
1,0]
[ 1,
2,0]
[ 1,
2,0]
[0,0,0] [0,0,0] [ 1,
2,0]