Numerous BFT agreement protocols4, 6, 13, 18, 23, 26 have used
tentative execution to reduce the latency experienced by clients. This optimization allows replicas to execute a request
tentatively as soon as they have collected the equivalent of a
Zyzzyva commit certificate for that request. This optimization may appear similar to Zyzzyva’s support for speculative
execution, but there are two fundamental differences. First,
Zyzzyva’s speculative execution allows requests to complete
at a client after a single phase, without the need to compute
a commit certificate: this reduction in latency is not possible
with traditional tentative executions. Second, and more importantly, in traditional BFT systems a replica can execute a request tentatively only after the replica knows that all previous
requests have been committed. In Zyzzyva, replicas continue
to execute requests speculatively, without waiting to know
that requests with lower sequence numbers have completed.
This difference is what lets Zyzzyva leverage speculation to
achieve not just lower latency but also higher throughput.
Speculator20 allows clients to speculatively complete operations at the application level and perform client-level rollback. A similar approach could be used in conjunction with
Zyzzyva to support clients that want to act on a reply optimistically rather than waiting on the specified set of responses.
Zyzzyva’s focus is on maximizing the peak performance of
BFT replication. Conversely, Clement et al. 5 argue that BFT
systems should seek not only to ensure safety but also good
performance in the presence of Byzantine faults, and their
Aardvark system eliminates fragile optimizations that maximize best-case performance but that can allow a faulty client
or server to drive the system down expensive execution paths.
6. concLusion
By systematically exploiting speculation, Zyzzyva exhibits
significant performance improvements over existing BFT
protocols. The throughput overheads and latency of Zyzzyva
approach the theoretical lower bounds for any BFT state machine replication protocol.
Looking forward, although we expect continued progress
in improving the performance (for example, by making additional assumptions about the application characteristics)
and robustness (in the presence of broader range of failure
scenarios) of BFT replication, we believe that Zyzzyva demonstrates that BFT overheads should no longer be regarded
as a barrier to using BFT replication for even many highly
demanding services. acknowledgments
3. amazon s3 availability event: July 20,
2008. http://status.aws.amazon.com/
s3-20080720.html.
4. Castro, m. and liskov, b. practical
byzantine fault tolerance and proactive
recovery. ACM Transactions on
Computer Systems, november 2002.
5. Clement, a., marchetti, m., Wong,
e., alvisi, l., and dahlin, m. making
byzantine fault tolerant services
tolerate byzantine faults. technical
report 08-27, ut austin department
of Computer sciences, may 2008.
6. Cowling, J., myers, d., liskov,
b., rodrigues, r., and shrira, l.
hQ replication: a hybrid quorum
protocol for byzantine fault
tolerance. Proceedings of the
Symposium on Operating Systems
Design and Implementation,
november 2006.
7. dwork, C., lynch, n., and stockmeyer,
l. Consensus in the presence of
partial synchrony. Journal of the
ACM, 1988.
8. fischer, m., lynch, n., and paterson,
m. impossibility of distributed
consensus with one faulty process.
Journal of the ACM, 32( 2):374–382,
april 1985.
9. Ghemawat, s., Gobioff, h., and
leung, s. the Google file system.
Proceedings of 19th ACM Symposium
on Operating Systems Principles,
october 2003.
10. herlihy, m. and Wing, J. linearizability:
a correctness condition for
concurrent objects. ACM Transactions
on Programming Languages and
Systems, 12( 3), 1990.
11. Kotla, r., alvisi, l., dahlin, m.,
Clement, a., and Wong, e. zyzzyva:
speculative byzantine fault tolerance.
ACM Symposium on Operating
Systems Principles, 2007.
12. Kotla, r., alvisi, l., dahlin, m.,
Clement, a., and Wong, e. zyzzyva:
speculative byzantine fault tolerance.
technical report utCs-tr-07-40,
university of texas at austin, 2007.
13. Kotla, r. and dahlin, m. high-throughput byzantine fault tolerance.
Proceedings of the International
Conference on Dependable Systems
and Networks, June 2004.
14. Kotla, r., dahlin, m., and alvisi, l.
safestore: a durable and practical
storage system. Proceedings of
the USENIX Annual Technical
Conference, June 2007.
15. lamport, l., shostak, r., and pease,
m. the byzantine generals problem.
ACM Transactions on Programming
Languages and Systems, 4( 3):
382–401, July 1982.
16. lamport, l. using time instead of
timeout for fault-tolerant distributed
systems. ACM Transactions on
Programming Languages and
Systems, 6( 2):254–280, april 1984.
17. lamport, l. lower bounds
for asynchronous consensus.
Proceedings of the FUDICO, pp.
22–23, June 2003.
18. martin, J.-p. and alvisi, l. fast
byzantine consensus. IEEE TODSC,
3( 3):202–215, July 2006.
19. nightingale, e., veeraraghavan, K.,
Chen, p., and flinn, J. rethink the
sync. Proceedings of the Symposium
on Operating Systems Design and
Implementation, 2006.
20. nightingale, e. b., Chen, p., and
flinn, J. speculative execution in a
distributed file system. Proceedings of
the Symposium on Operating Systems
Principles, october 2005.
21. pease, m., shostak, r., and lamport,
l. reaching agreement in the
presence of faults. Journal of the
ACM, 27( 2):228–234, april 1980.
22. prabhakaran, v., bairavasundaram,
l., agrawal, n., arpaci-dusseau, h.
G. a., and arpaci-dusseau, r. iron
file systems. Proceedings of the
Symposium on Operating Systems
Principles, 2005.
23. rodrigues, r., Castro, m., and liskov,
b. base: using abstraction to improve
fault tolerance. Proceedings of the
Symposium on Operating Systems
Principles, october 2001.
24. schneider, f. b. implementing fault-tolerant services using the state
machine approach: a tutorial. ACM
Computing Surveys, 22( 4):299–319,
1990.
25. yang, J., sar, C., and engler, d.
explode: a lightweight, general
system for finding serious storage
system errors. Proceedings of the
Symposium on Operating Systems
Design and Implementation,
2006.
26. yin, J., martin, J.-p., venkataramani,
a., alvisi, l., and dahlin, m. separating
agreement from execution for byzantine
fault tolerant services.
Proceedings of the Symposium on
Operating Systems Principles, october
2003.
acknowledgments
This work was supported in part by NSF grants CNS-0720649,
CNS-0509338, and CNS-0411026. We thank Rodrigo Rodrigues,
James Cowling, and Michael Abd-El-Malek for sharing source
code for PBFT, HQ, and Q/U, respectively. We are grateful for
Maurice Herlihy’s feedback on earlier drafts of this article.
Ramakrishna Kotla ( kotla@microsoft.com)
microsoft research silicon valley, mountain
view, Ca.
Allen Clement ( aclement@cs.utexas.edu)
department of Computer sciences, university
of texas, austin.
Edmund Wong ( elwong@cs.utexas.edu)
department of Computer sciences, university
of texas, austin.
Lorenzo Alvisi ( lorenzo@cs.utexas.edu)
department of Computer sciences, university
of texas, austin.
Mike Dahlin ( dahlin@cs.utexas.edu)
department of Computer sciences, university
of texas, austin.
References
1. abd-el-malek, m., Ganger, G.,
Goodson, G., reiter, m., and Wylie, J.
fault-scalable byzantine fault-tolerant services. Proceedings of the
Symposium on Operating Systems
Principles, october 2005.
2. aiyer, a. s., alvisi, l., Clement, a.,
dahlin, m., martin, J.-p., and porth, C.
bar fault tolerance for cooperative
services. Proceedings of the Symposium on Operating Systems Principles,
pp. 45–58, october 2005.
© aCm 0001-0782/08/1100 $5.00