Another famous metaphor of yours is the
Part-time Parliament on the fictional is-
land of Paxos.
The people at DEC were building a
distributed file system. When they explained the properties they wanted—an
asynchronous algorithm that would
maintain consistency despite any number of non-Byzantine faults, and would
make progress if any majority of the processors were working—I thought it was
impossible. So I set about trying to prove
that it could not be done, and I failed by
discovering the algorithm, which was independently discovered by Brian Oki and
Barbara Liskov around the same time.
Yet a fully asynchronous system must
either never make a mistake, but perhaps never get anything done, or else be
guaranteed to get something done but
perhaps make some mistakes—the so-called “safety” and “liveness” properties
that you coined.
Safety properties assert that the program does not do anything bad, while liveness properties assert that it eventually
does something. Paxos guarantees that,
assuming only crash failures, the system
never does anything wrong, but in principle it is possible for it to never succeed
in doing anything. In practice, as long as
the system is not behaving completely erratically, things will eventually get done.
So based on the success of the Byzantine
generals, you invented the story about
Paxos and the parliament.
It was a complete disaster. That pa-
per would have died except for Butler
Lampson, who understood its practical
that if one
message could causally affect another
message, it would wind up with a lower
timestamp. Once you have this ordering
of messages, you can use it to imple-
ment a state machine. A state machine
is a mathematical abstraction of a digi-
tal system as something with a state that
takes inputs; an input causes it to
change to a new state and generate an
output. So I realized that you could im-
plement an arbitrary distributed system
by describing what it is supposed to do
as a state machine, using an algorithm
that implements an arbitrary state ma-
chine in a distributed system. The sim-
plest example I could think of was to
write a state machine which solved Dijk-
stra’s mutual exclusion problem.
Several years later, you popularized the
so-called Byzantine Generals Problem—
in which a group of generals, some of
whom may be traitors, have to reach a
common decision—and invented a solution based on digital signatures.
The “Time, Clocks” paper introduced
the idea of implementing a distributed
system using a state machine, but that
algorithm assumed no failures. Yet we
build distributed systems because we expect things to fail. These days, when we
talk about failures, we mean computers
crashing. At the time, that body of practice did not exist, so I considered arbitrary failures, where a computer might
produce a wrong answer.
Most people thought that to handle
one bad computer, you had to have a
total of three and use majority voting.
But in a distributed system, a computer
might tell one computer that the answer is 5, and the other computer that
the answer is 2. So you actually need
However, if you use digital signatures, you can use three computers.
If one process signs those messages,
those two other processes can tell that
it is faulty, because it signed messages
with two different values. The original
paper didn’t talk about Byzantine generals; it talked about interactive consistency. But I thought this was a really
important problem, and I realized that
giving it this catchy notion of the Byzantine generals would make the problem
more popular, so I wrote another paper
with the same authors, Marshall Pease
and Robert Shostak.
significance. Eventually, other people
Let’s talk about LaTeX, the typesetting
system you designed. If I understand
correctly, you were writing a book yourself and were unhappy with the macros
available at the time for TeX?
The system that I had been using was
called Scribe. And it was really easy to
use, and then TeX came along, and TeX
was very hard to use, but it was also much
more powerful. So my basic idea was to
try to build macros that would make TeX
as easy to use as Scribe.
You must have learned a lot about typog-
I think people have this sense that the
purpose of typographic design is to make
something look nice, whereas the real
purpose is to make it easier to read.
You also introduced—and continue to
refine—a powerful formalism for describing and reasoning about concurrent systems known as Temporal Logic
of Actions, or TLA.
Amir Pnueli introduced temporal
logic to computer science, and people
quickly started using it as a specification
language. I soon realized it was exactly
what you wanted to use to reason about
Amir’s original logic was beautiful
and simple, but it was not powerful
enough to express the things that we
wanted to express in writing specifications.
The approach most people took was
to develop more complicated temporal
operators, but I realized that the best way
to specify safety properties is with my old
friend the state machine. And then I had
the wonderful idea of extending Amir’s
logic not by extending the temporal operators, but by extending the class of formulas to which you apply the temporal
operators. So I was able to describe the
state machine in a very straightforward
and practical way as a temporal logic
formula, and then adding the liveness
condition was just another part of the
formula, so I could describe the entire
specification as a single temporal logic
formula, but in a practical way.
Leah Hoffmann is a technology writer based in Piermont,
© 2014 ACM 0001-0782/14/06 $15.00
The Parliament of
Paxos story “was a
That paper would
have died except
for Butler Lampson,
[CONTINUED FROM P. 112]