figure 5: Latency versus throughput for the 0/0 benchmark. Q/u
throughput is scaled from1. Q/U best latency is the measured latency
for our idealized model implementation of Q/u under low offered load.
Latency per request (ms)
Q/U max throughput
PBF T (b = 10)
Zyzzyva5 (b= 10)
Zyzzyva (b= 10)
Q/U best latency
40 60 80 100 120
peak throughput. Adaptively setting the batch size in response
to workload characteristics is an avenue for future work.
Overall, all of the BFT protocols do add service latency
compared to an unreplicated server, but Zyzzyva is generally
competitive with the best protocols by this metric. We speculate that the additional 120 to 250 microseconds that Zyzzyva requires compared to an unreplicated server will be a significant barrier for only the most demanding services, and
we note that the relative gap would shrink for services that
do more than execute the null request.
4. 4. Performance during failures
Zyzzyva guarantees correct execution with any number of
faulty clients and up to f faulty replicas. However, its performance is optimized for the case of failure-free operation,
and a single faulty replica can force Zyzzyva to execute the
slower two-phase protocol.
One solution is to buttress Zyzzyva’s fast one-phase path
by employing additional servers. Zyzzyva511 uses a total of
5f + 1 servers (2f + 1 full replicas and 3f additional witnesses)
to allow the system to complete requests via the fast communication pattern shown in Figure 1a when the client receives
4f + 1 (out of 5f + 1) matching replies.
Surprisingly, however, even when running with 3f + 1 replicas, Zyzzyva remains competitive with existing protocols
even when it falls back on two-phase operation. In particular, Zyzzyva’s cryptographic overhead at the bottleneck replica increases from 2 + ( (3f + 1)/b) to 3 + ( (5f + 1)/b) operations
per request if we simply execute the two-phase algorithm
described above.** Furthermore, our implementation also
includes a commit optimization12 that reduces cryptographic
overheads to 2 + ( (5f + 1)/b) cryptographic operations per request by having replicas that suspect a faulty primary initiate and complete the second phase to commit the request
before they execute the request and send the response (with
** As noted at the start of this section we omit the preferred quorums optimization in our experimental evaluations, so the 2 + (3f + 1)b MAC operations
per request in our measurements is higher than the 2 + 3f/b listed in Table 2.
figure 6: Realized throughput for the 0/0 benchmark as the number of
clients varies when f = 1 nonprimary replicas fail to respond to requests.
Zyzzyva5 (b = 10)
Zyzzyva (b = 10)
PBF T (b = 10)
Zyzzyva without commit opt (b = 10)
Zyzzyva5 (b = 1)
Zyzzyva (b = 1)
Zyzzyva without commit opt (b = 1)
PBFT (b = 1)
Number of clients
the committed history) back to the client.
Figure 6 compares throughputs of Zyzzyva, Zyzzyva5,
PBFT, and HQ in the presence of f nonprimary-server fail-stop failures. We do not include a discussion of Q/U in this
section as the throughput numbers of Q/U with failures are
not reported, 1 but we would not expect a fail-stop failure by
a replica to significantly reduce the performance shown for
Q/U in Figure 3. Also, we do not include a line for the unreplicated server case as the throughput falls to zero when the
only server suffers a fail-stop failure.
As Figure 6 shows, without the commit optimization,
falling back on two-phase operation reduces Zyzzyva’s maximum throughput from 86K requests/second (Figure 3) to
52K requests/second. Despite this extra overhead, Zyzzyva’s
“slow case” performance remains within 13% of PBFT’s performance, which is less highly tuned for the failure-free case
and which suffers no slowdown in this scenario. Zyzzyva’s
commit optimization repairs most of the damage caused by a
fail-stop replica, maintaining a throughput of 82 K requests/
second which is within 5% of the peak throughput achieved
for the failure-free case. For systems that can afford extra
witness replicas, Zyzzyva5’s throughput is not significantly
affected by the fail-stop failure of a replica, as expected.
5. ReLateD WoRK
Zyzzyva stands on the shoulders of recent efforts that have
dramatically cut the costs and improved the practicality of
BFT replication. Castro and Liskov’s PBFT protocol4 devised
techniques to eliminate expensive signatures and potentially fragile timing assumptions, and it demonstrated high
throughputs of over 10K requests/second. This surprising
result jump started an arms race in which researchers reduced replication costs, 26 and improved performance1, 6, 13
of BFT service replication. Zyzzyva incorporates many of the
ideas developed in these protocols and folds in the new idea
of speculative execution to construct an optimized fast path
that significantly outperforms existing protocols and that
has replication cost, processing overhead, and latency that
approach the theoretical minima for these metrics.