Zyzzyva: Speculative Byzantine
Fault Tolerance
By ramakrishna Kotla,* Allen Clement, Edmund Wong, Lorenzo Alvisi, and Mike Dahlin
abstract
A longstanding vision in distributed systems is to build reliable
systems from unreliable components. An enticing formulation
of this vision is Byzantine fault-tolerant (BFT) state machine
replication, in which a group of servers collectively act as a correct server even if some of the servers misbehave or malfunction in arbitrary (“Byzantine”) ways. Despite this promise, practitioners hesitate to deploy BFT systems at least partly because
of the perception that BFT must impose high overheads.
In this article, we present Zyzzyva, a protocol that uses
speculation to reduce the cost of BFT replication. In Zyzzyva,
replicas reply to a client’s request without first running an
expensive three-phase commit protocol to agree on the order
to process requests. Instead, they optimistically adopt the
order proposed by a primary server, process the request, and
reply immediately to the client. If the primary is faulty, replicas can become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas
converge on a single total ordering of requests, and only rely
on responses that are consistent with this total order. This
approach allows Zyzzyva to reduce replication overheads to
near their theoretical minima and to achieve throughputs of
tens of thousands of requests per second, making BFT replication practical for a broad range of demanding services.
1. intRoDuction
Mounting evidence suggests that real systems must contend
not only with simple crashes but also with more complex failures ranging from hardware data corruption22 to nondeterministic software errors25 to security breaches. Such failures
can cause even highly engineered services to become unavailable or to lose data. For example, a single corrupted bit in a
handful of messages recently brought down the Amazon S3
storage service for several hours, 3 and several well-known e-mail service providers have occasionally lost customer data. 14
Byzantine fault-tolerant (BFT) state machine replication
is a promising approach to masking many such failures and
constructing highly reliable and available services. In BFT
replication, n ≥ 3f + 1 servers collectively act as a correct server even if up to f servers misbehave or malfunction in arbitrary (“Byzantine”) ways. 15, 16
Today, three trends make real-world deployment of BFT
increasingly attractive. First, as noted above, there is mounting evidence of non-fail-stop behaviors in real systems, motivating the use of new techniques to improve robustness.
Second, the growing value of data and the falling costs of
hardware make it advantageous for service providers to
trade increasingly inexpensive hardware for the peace of
* This work was mostly done when the author was at the University of Texas
at Austin.
mind potentially provided by BFT replication. Third, improvements to the state of the art in BFT algorithms1, 4, 6, 13, 23, 26
have narrowed the gap between BFT replication costs and
the costs already being paid for non-BFT replication by many
commercial services. For example, by default, the Google
file system uses three-way replication of storage, 9 which is
roughly the cost of tolerating one Byzantine failure by using
three full replicas plus one additional lightweight node to
help the replicas coordinate their actions. 26
Unfortunately, practitioners hesitate to deploy BFT systems at least partly because of the perception that BFT must
impose high overheads. This concern motivates our work,
which seeks to answer a simple question: Can we build a system that tolerates a broad range of faults while meeting the demands of high-performance services?
To answer this question, this article presents Zyzzyva.†
Zyzzyva seeks to make BFT replication deployable for the
widest range of practical services by implementing the extremely general abstraction of a replicated state machine at
an extremely low cost.
The basic idea of BFT state machine replication is simple: a client sends a request to a replicated service and the
service’s distributed agreement protocol ensures that correct servers execute the same requests in the same order. 24
If the service is deterministic, each correct replica thus traverses the same series of states and produces the same reply
to each request. The servers send their replies back to the
client, and the client accepts a reply that matches across a
sufficient number of servers.
Zyzzyva builds on this basic approach, but reduces its
cost through speculation. As is common in existing BFT replication protocols, an elected primary server proposes an order on client requests to the other server replicas. 4 However,
unlike in traditional protocols, Zyzzyva replicas then immediately execute requests speculatively, without running an
expensive agreement protocol to definitively establish the
order. As a result, if the primary is faulty, correct replicas’
states may diverge, and they may send different responses to
a client. Nonetheless, Zyzzyva preserves correctness because
a correct client detects such divergence and avoids acting
on a reply until the reply and the sequence of preceding requests are stable and guaranteed to eventually be adopted
† Zyzzyva (ZIZ-uh-vuh) is the last word in most dictionaries.
According to dictionary.com, a zyzzyva is “any of various South American
weevils of the genus Zyzzyva, often destructive to plants.”
A previous version of this paper was published in
Proceedings of 21st ACM Symposium on Operating Systems
Principles (SOSP), October 2007, p. 45–58.