The (trivial) lower bound on processing overhead is for
each server to process two MAC operations per client request: one to verify the client’s request and one to authenticate its reply. Zyzzyva and PBFT approach this bound by
using a batching optimization in which the primary accumulates a batch of b client requests and leads agreement on
that batch rather than on each individual request. Zyzzyva’s
speculative execution allows it to avoid several rounds of all-to-all communication among servers, so it requires fewer
MAC operations per batch than PBFT.
In the last row of the table body, latency counts the number of one-way message delays from when a client issues a request until it receives a reply. In the general case, agreement
requires three message delays, 17 and Zyzzyva matches this
bound by having requests go from the client to the primary
to the replicas to the client. Q/U circumvents this bound by
optimizing for the case of no request contention so that requests go directly from the client to the replicas to the client. We chose to retain the extra hop through the primary in
Zyzzyva because it facilitates batching, which we consider to
be an important throughput optimization.
The models described in this subsection focus on what
we regard as important factors for understanding the performance trade-offs of different algorithms, but they necessarily
omit details present in implementations. Also, as is customary, 1, 4, 6, 23, 26 Table 2 compares the protocols’ performance during the optimized case of fault-free, timeout-free execution. In
the rest of this section we experimentally examine these protocols’ throughput, latency, and performance during failures.
4. 2. throughput
Figure 3 shows the throughput measured for the 0/0 benchmark for Zyzzyva, Zyzzyva5, 11 PBFT, and HQ (scaled as noted
above). For reference, we also show the peak throughput reported for Q/U1 in the f = 1 configuration, scaled to our environment as described above.
Zyzzyva executes over 50K requests/second without
batching, and this number rises to 86K requests/second
when batching is activated with 10 requests per batch. As the
figure 3: Realized throughput for the 0/0 benchmark as the number
of client varies. Q/u throughput is scaled from1.
140
Unreplicated
120
100
Throughput (Kops/s)
80
60
40
Zyzzyva (b = 10)
Zyzzyva5 (b= 10)
PBF T (b = 10)
Zyzzyva
Zyzzyva5
Q/U max throughput
20
PBFT
HQ
0
0
20
40 60
Number of clients
80
100
93 communications of the acm | november 2008 | vol. 51 | no. 11
figure illustrates, Zyzzyva enjoys a significant throughput advantage over the other protocols.
It is also worth noting that when batching is enabled,
Zyzzyva’s throughput is within 35% of the throughput of an
unreplicated server that simply replies to client requests
over an authenticated channel. Furthermore, this gap would
fall if the service being replicated were more demanding
than the null service examined here. Overall, we speculate
that Zyzzyva’s throughput is sufficient to support BFT replication for a broad range of demanding services.
4. 3. Latency
Figure 4 shows the latencies of Zyzzyva, Zyzzyva5, HQ, and
PBFT for the 0/0, 0/4, and 4/0 workloads with a single client issuing one request at a time. We examine both the default read/
write requests that use the full protocol and read-only requests
that can exploit Zyzzyva and PBFT’s read-only optimization. 4
We did not succeed in getting Abd-El-Malek et al.’s implementation of Q/U running in our environment. However,
because Table 2 suggests that Q/U may have a latency advantage over other protocols, for comparison we implement an
idealized model of Q/U designed to provide an optimistic
estimate of Q/U’s latency in our environment. In our idealized implementation, a client simply generates and sends
4f + 1 MACs with a request, each replica verifies 4f + 1 MACs
( 1 to authenticate the client and 4f + 1 to validate the reported state), each replica in a preferred quorum6 generates and
sends 4f + 1 MACs ( 1 to authenticate the reply to the client
and 4f to authenticate the new state) with a reply to the client, and the client verifies 4f + 1 MACs.
For the read/write 0/0 and 4/0 benchmarks, Q/U does
have a modest latency advantage over Zyzzyva as predicted
by Table 2. For the read-only benchmarks, the situation is reversed with Zyzzyva exhibiting modestly lower latency than
Q/U because Zyzzyva’s read-only optimization completes
read-only requests in two message delays (like Q/U) but uses
fewer cryptographic operations.
Figure 5 shows latency and throughput as we vary offered
load for the 0/0 benchmark. As the figure illustrates, batching in
Zyzzyva, Zyzzyva5, and PBFT increases latency but also increases
figure 4: Latency for 0/0, 0/4, and 4/0 benchmarks.
1
HQ
HQ
0.8
HQ
0.6
Time (ms)
0.4
Zyzzyva
Zyzzyva5
PBFT
Q/U (Idealized model)
0.2
Unreplicated
0
0/0
HQ
Zyzzyva
Zyzzyva5
PBFT
Q/U (Idealized model)
Unreplicated
0/0 (r/o)
Zyzzyva
Zyzzyva5
PBFT
Q/U (Idealized model)
Unreplicated
0/4
Q/U (Idealized model)
Unreplicated
Zyzzyva
Zyzzyva5
PBFT
0/4 (r/o)
HQ
Zyzzyva
Zyzzyva5
PBFT
Q/U (Idealized model)
Unreplicated
4/0
HQ
Zyzzyva
Q/U (Idealized model)
Zyzzyva5
PBFT
Unreplicated