tions from committed transactions are
never lost, operations from concurrent
transactions do not interleave, and all or
none of the operations from a transaction persist, despite failures of application servers or storage servers.
Unfortunately, distributed transactions have long been thought to be prohibitively expensive. In modern storage
systems, which partition data for scalability and replicate data for fault tolerance, distributed transactions need
coordination at every level: on each
storage server, across replicas, and
across partitions.
Three recent research papers presented here have made significant
strides in reducing the coordination
needed for distributed transactions,
making them more efficient at every
level. The first reduces the cost of
read-only transactions across geo-distributed data centers using atomic clocks. The second reduces the
cost of read-write transactions across
replicas by eliminating consistency
from the replication protocol. The
last reduces the cost of transactions
on each storage server using a modular concurrency-control mechanism.
Taken together, these papers demonstrate that it is possible to provide
distributed transactions with low
cost, even at Google scale.
High-Performance Read-Only
Transactions with Atomic Clocks
Corbett, J. C., et al.
Spanner: Google’s globally distributed
database. In Proceedings of Operating Systems
Design and Implementation, 2012;
http://static.googleusercontent.com/media/
research.google.com/en//archive/spanner-
osdi2012.pdf.
Linearizable transactions are useful
for programmers because they behave
in a way that is easy to understand:
there is a single global transaction
ordering and it matches the order in
which the transactions commit. Un-
fortunately, linearizable transactions
are expensive, especially in a globally
distributed system, because they re-
quire every transaction to coordinate
with every other transaction, includ-
ing read-only transactions.
Spanner gets around this problem
by using loosely synchronized clocks.
Every storage server synchronizes
with an atomic clock in the data center, and they estimate the clock skew
between servers based on the drift
between the atomic clocks. Then
Spanner assigns every read-write
transaction a timestamp and waits
out the clock skew to ensure that the
timestamp is in the past, allowing
read-only transactions to read at their
local current time without any coordination. This technique comes with
a caveat, however: if their estimate of
the clock skew is off, Spanner no longer guarantees a linearizable transaction ordering.
High-Performance
Read-Write Transactions with
Unordered Replication
Zhang, I., et al.
Building consistent transactions with
inconsistent replication. Symposium on Operating
Systems Principles, 2015; https://homes.
cs.washington.edu/~arvind/papers/tapir.pdf.
While Spanner makes read-only transactions less expensive, it does not
reduce the cost of read-write transactions. This selection makes the observation that there is wasted work in
existing databases when committing
transactions: both the transaction
protocol and the replication protocol
enforce a strong ordering. Thus, it is
possible to eliminate the coordination
across replicas by using a completely
unordered replication protocol and
enforce only a linearizable ordering of
committed transactions.
The paper introduces an unordered, consensus-based replication
protocol, called inconsistent replication, and defines TAPIR (
Transactional Application Protocol for Inconsistent Replication) to run on top
of it. TAPIR also uses loosely synchronized clocks but as a performance optimization, not a correctness require-
Three recent
research papers
have made
significant strides
in reducing
the coordination
needed for
distributed
transactions,
making them
more efficient
at every level.