by all correct servers. Thus, applications at correct clients
observe the traditional abstraction of a replicated state machine that executes requests in a linearizable10 order.
Essentially, Zyzzyva “rethinks the sync” 19 for BFT. Whereas past BFT systems have pessimistically enforced the condition that a correct server only emits replies that are stable,
Zyzzyva recognizes that this condition is stronger than required. Instead, Zyzzyva enforces the weaker condition:
a correct client only acts on replies that are stable. This change
allows us to move the output commit from the servers to the
client, which in the optimized case allows servers to avoid
expensive all-to-all communication that they would otherwise require to ensure the stronger condition.
Leveraging the client in this way allows us to minimize
server overheads and maximize throughputs in the optimized, failure-free case. As a result, Zyzzyva’s peak measured
throughput of over 86K requests/second on 3.0GHz
Pentium-IV machines makes it feasible to utilize BFT replication in a broad range of demanding services. Despite this
aggressive optimization to the fault-free case, Zyzzyva retains
good performance of over 82K requests/second even when
up to f backup replicas crash. In fact, Zyzzyva’s replication
costs, processing overheads, and communication latencies
approach their theoretical lower bounds.
2. s Ystem moDeL
To maximize fault tolerance, BFT replication assumes what
is essentially an adversarial failure model. Under this mod-
el, faulty nodes (servers or clients) may deviate from their
intended behavior in arbitrary ways, representing problems
such as hardware faults, software faults, node misconfigu-
rations, or even malicious attacks. This model further as-
sumes a strong adversary that can coordinate faulty nodes to
compromise the replicated service. Note, however, that our
model assumes the adversary cannot break cryptographic
techniques like collision-resistant hashes, encryption, and
signatures; we denote a message m signed by principal q’s
public key as 〈m〉 . Zyzzyva ensures its safety and liveness
properties if at most f replicas are faulty, and it assumes a
finite client population, any number of which may be faulty.
It makes little sense to build a system that can tolerate Byzantine replicas/servers‡ and clients but that can be corrupted
by an unexpectedly slow node or network link, hence we design
Zyzzyva so that its safety properties hold in any asynchronous
distributed system where nodes operate at arbitrary speeds
and are connected by a network that may fail to deliver messages, corrupt them, delay them, or deliver them out of order.
Unfortunately, ensuring both safety and liveness for consensus in an asynchronous distributed system is impossible if any
server can crash, 8 let alone if servers can be Byzantine. Zyzzyva’s
liveness, therefore, is ensured only during intervals in which
messages sent to correct nodes are processed within some arbitrarily large fixed (but potentially unknown) worst-case delay
from when they are sent. This assumption appears easy to meet
in practice if broken links are eventually repaired.
Zyzzyva implements a BFT service using state machine replication. 16, 24 Traditional state machine replication techniques
‡ We use the terms replica and server interchangeably.
87 communications of the acm | november 2008 | vol. 51 | no. 11
can be applied only to deterministic services. Zyzzyva copes with
the nondeterminism present in many real-world applications
such as file systems and databases using standard techniques
to abstract the observable application state at the replicas and
to resolve nondeterministic choices via the agreement stage. 23
If a client of a service issues an erroneous or malicious request, Zyzzyva’s job is to ensure that the request is processed
consistently at all correct replicas; the replicated service, itself, is responsible for protecting its application state from
such erroneous requests. Services typically limit the damage
by authenticating clients and enforcing access control. For
example, in a replicated file system, if a client tries to write
a file without appropriate credentials, correct replicas could
all process the request by returning an error code.
3. aGReement PRotocoL
Zyzzyva is a state machine replication protocol executed by
3f + 1 replicas and based on three subprotocols: ( 1) agreement, ( 2) view change, and ( 3) checkpoint. The agreement
subprotocol orders requests for execution by the replicas.
Agreement operates within a sequence of views, and in each
view a single replica, designated the primary, is responsible
for leading the agreement subprotocol. The view change subprotocol coordinates the election of a new primary when the
current primary is faulty or the system is running slowly. The
checkpoint subprotocol limits the state that must be stored by
replicas and reduces the cost of performing view changes.
For simplicity, this article focuses on the agreement subprotocol. The view change and execution subprotocols are
similar to those used previously. 4, 26 Interested readers may
refer to Kotla et al. 11 for the full protocol.
Figure 1 shows the communication pattern for a single instance of Zyzzyva’s agreement subprotocol. In the fast, no-fault
case (Figure 1a), a client simply sends a request to the primary,
the primary forward the request to the replicas, and the replicas
figure 1: Protocol communication pattern for agreement within a
view for (a) the fast case and (b) the two-phase faulty replica case.
the numbers refer to the main steps of the protocol in the text.
3f+ 1 4a
(a) Fast case
2f+ 1 6
Speculative execution Commit
(b) Two-phase case