a programming language, and the ease
with which programmers could use it.
For this reason, I appealed to academic
researchers engaged in programming
language design to help me in the research. The latest response comes from
hardware designers, who are using axioms in anger (and for the same reasons
as given above) to define the properties
of modern multicore chips with weak
One thing I got spectacularly wrong.
I could see that programs were getting
larger, and I thought that testing would
be an increasingly ineffective way of removing errors from them. I did not realize that the success of tests is that they
test the programmer, not the program.
Rigorous testing regimes rapidly persuade error-prone programmers (like
me) to remove themselves from the
profession. Failure in test immediately
punishes any lapse in programming
concentration, and (just as important)
the failure count enables implementers
to resist management pressure for premature delivery of unreliable code. The
experience, judgment, and intuition of
programmers who have survived the rigors of testing are what make programs
of the present day useful, efficient, and
(nearly) correct. Formal methods for
achieving correctness must support the
intuitive judgment of programmers, not
My basic mistake was to set up proof
in opposition to testing, where in fact
both of them are valuable and mutually supportive ways of accumulating
evidence of the correctness and serviceability of programs. As in other branches
of engineering, it is the responsibility of
the individual software engineer to use
all available and practicable methods,
in a combination adapted to the needs
of a particular project, product, client,
or environment. The best contribution
of the scientific researcher is to extend
and improve the methods available to
the engineer, and to provide convincing
evidence of their range of applicability.
Any more direct advocacy of personal
research results actually excites resistance from the engineer.
On retirement from University, I accepted a job offer from Microsoft Research in Cambridge (England). I was
surprised to discover that assertions,
i did not realize that
the success of tests
is that they test
not the program.
sprinkled more or less liberally in the
program text, were used in development
practice, not to prove correctness of programs, but rather to help detect and diagnose programming errors. They are
evaluated at runtime during overnight
tests, and indicate the occurrence of any
error as close as possible to the place in
the program where it actually occurred.
The more expensive assertions were
removed from customer code before
delivery. More recently, the use of assertions as contracts between one module
of program and another has been incorporated in Microsoft implementations
of standard programming languages.
This is just one example of the use of
formal methods in debugging, long before it becomes possible to use them in
proof of correctness.
In 1969, my proof rules for programs
were devised to extract easily from
a well-asserted program the mathematical ‘verification conditions’, the
proof of which is required to establish
program correctness. I expected that
these conditions would be proved by
the reasoning methods of standard
logic, on the basis of standard axioms
and theories of discrete mathematics.
What has happened in recent years is
exactly the opposite of this, and even
more interesting. New branches of
applied discrete mathematics have
been developed to formalize the programming concepts that have been
introduced since 1969 into standard
programming languages (for example,
objects, classes, heaps, pointers). New
forms of algebra have been discovered
for application to distributed, concurrent, and communicating processes.
New forms of modal logic and abstract
domains, with carefully restricted expressive power, have been invented to
simplify human and mechanical rea-
soning about programs. They include
the dynamic logic of actions, temporal
logic, linear logic, and separation logic.
Some of these theories are now being
reused in the study of computational
biology, genetics, and sociology.
Equally spectacular (and to me unexpected) progress has been made in the
automation of logical and mathematical proof. Part of this is due to Moore’s
Law. Since 1969, we have seen steady exponential improvements in computer
capacity, speed, and cost, from megabytes to gigabytes, and from megahertz
to gigahertz, and from megabucks to
kilobucks. There has been also at least
a thousand-fold increase in the efficiency of algorithms for proof discovery and
counterexample (test case) generation.
Crudely multiplying these factors, a
trillion-fold improvement has brought
us over a tipping point, at which it has
become easier (and certainly more reliable) for a researcher in verification to
use the available proof tools than not to
do so. There is a prospect that the activities of a scientific user community will
give back to the tool-builders a wealth
of experience, together with realistic
experimental and competition material, leading to yet further improvements
of the tools.
For many years I used to speculate
about the eventual way in which the results of research into verification might
reach practical application. A general
belief was that some accident or series of accidents involving loss of life,
perhaps followed by an expensive suit
for damages, would persuade software
managers to consider the merits of program verification.
This never happened. When a bug
occurred, like the one that crashed the
maiden flight of the Ariane V spacecraft
in 1996, the first response of the manager was to intensify the test regimes, on
the reasonable grounds that if the erroneous code had been exercised on test,
it would have been easily corrected before launch. And if the issue ever came
to court, the defense of ‘state-of-the-art’
practice would always prevail. It was
clearly a mistake to try to frighten people into changing their ways. Far more
effective is the incentive of reduction in
cost. A recent report from the U.S. Department of Commerce has suggested
that the cost of programming error to
the world economy is measured in tens