Zyzzyva: Speculative Byzantine
Fault Tolerance

By ramakrishna Kotla,* Allen Clement, Edmund Wong, Lorenzo Alvisi, and Mike Dahlin

abstract

A longstanding vision in distributed systems is to build reliable systems from unreliable components. An enticing formulation of this vision is Byzantine fault-tolerant (BFT) state machine replication, in which a group of servers collectively act as a correct server even if some of the servers misbehave or malfunction in arbitrary (“Byzantine”) ways. Despite this promise, practitioners hesitate to deploy BFT systems at least partly because of the perception that BFT must impose high overheads.

In this article, we present Zyzzyva, a protocol that uses speculation to reduce the cost of BFT replication. In Zyzzyva, replicas reply to a client’s request without first running an expensive three-phase commit protocol to agree on the order to process requests. Instead, they optimistically adopt the order proposed by a primary server, process the request, and reply immediately to the client. If the primary is faulty, replicas can become temporarily inconsistent with one another, but clients detect inconsistencies, help correct replicas converge on a single total ordering of requests, and only rely on responses that are consistent with this total order. This approach allows Zyzzyva to reduce replication overheads to near their theoretical minima and to achieve throughputs of tens of thousands of requests per second, making BFT replication practical for a broad range of demanding services.

1. intRoDuction

Mounting evidence suggests that real systems must contend not only with simple crashes but also with more complex failures ranging from hardware data corruption22 to nondeterministic software errors25 to security breaches. Such failures can cause even highly engineered services to become unavailable or to lose data. For example, a single corrupted bit in a handful of messages recently brought down the Amazon S3 storage service for several hours, 3 and several well-known e-mail service providers have occasionally lost customer data. 14

Byzantine fault-tolerant (BFT) state machine replication is a promising approach to masking many such failures and constructing highly reliable and available services. In BFT replication, n ≥ 3f + 1 servers collectively act as a correct server even if up to f servers misbehave or malfunction in arbitrary (“Byzantine”) ways. 15, 16

Today, three trends make real-world deployment of BFT increasingly attractive. First, as noted above, there is mounting evidence of non-fail-stop behaviors in real systems, motivating the use of new techniques to improve robustness. Second, the growing value of data and the falling costs of hardware make it advantageous for service providers to trade increasingly inexpensive hardware for the peace of

* This work was mostly done when the author was at the University of Texas at Austin.

mind potentially provided by BFT replication. Third, improvements to the state of the art in BFT algorithms1, 4, 6, 13, 23, 26 have narrowed the gap between BFT replication costs and the costs already being paid for non-BFT replication by many commercial services. For example, by default, the Google file system uses three-way replication of storage, 9 which is roughly the cost of tolerating one Byzantine failure by using three full replicas plus one additional lightweight node to help the replicas coordinate their actions. 26

Unfortunately, practitioners hesitate to deploy BFT systems at least partly because of the perception that BFT must impose high overheads. This concern motivates our work, which seeks to answer a simple question: Can we build a system that tolerates a broad range of faults while meeting the demands of high-performance services?

To answer this question, this article presents Zyzzyva. Zyzzyva seeks to make BFT replication deployable for the widest range of practical services by implementing the extremely general abstraction of a replicated state machine at an extremely low cost.

The basic idea of BFT state machine replication is simple: a client sends a request to a replicated service and the service’s distributed agreement protocol ensures that correct servers execute the same requests in the same order. 24 If the service is deterministic, each correct replica thus traverses the same series of states and produces the same reply to each request. The servers send their replies back to the client, and the client accepts a reply that matches across a sufficient number of servers.

Zyzzyva builds on this basic approach, but reduces its cost through speculation. As is common in existing BFT replication protocols, an elected primary server proposes an order on client requests to the other server replicas. 4 However, unlike in traditional protocols, Zyzzyva replicas then immediately execute requests speculatively, without running an expensive agreement protocol to definitively establish the order. As a result, if the primary is faulty, correct replicas’ states may diverge, and they may send different responses to a client. Nonetheless, Zyzzyva preserves correctness because a correct client detects such divergence and avoids acting on a reply until the reply and the sequence of preceding requests are stable and guaranteed to eventually be adopted

Zyzzyva (ZIZ-uh-vuh) is the last word in most dictionaries. According to dictionary.com, a zyzzyva is “any of various South American weevils of the genus Zyzzyva, often destructive to plants.”

A previous version of this paper was published in Proceedings of 21st ACM Symposium on Operating Systems Principles (SOSP), October 2007, p. 45–58.

References:

http://dictionary.com

Archives