causes only 3% of the time.
29 Problems
with requirements and usability dwarf
the problems of bugs in code, suggesting that the emphasis on coding practices and tools, both in academia and
industry, may be mistaken. Exploiting
tools to check arguments at the design
and requirements level may be more
important; and it is often more feasible, as artifacts at the higher level are
much smaller.
21
Nevertheless, the correctness of
code is a vital link in the dependability chain. Even if the low incidence of
failures due to bugs reflects success in
improving code quality, the cost is still
unacceptable,
11 especially when very
high assurance is required. Note too
that in the arena of security, code vulnerabilities are responsible for a much
higher proportion of failures than in
the safety arena.
Testing and analysis. Testing is a crucial part of any software-development
process, and its effectiveness can be
amplified by liberal use of runtime assertions, by formulating tests early on,
by creating tests in response to bug reports, and by integrating testing into
the build so that tests are run frequently and automatically. But as discussed
above, testing cannot generally deliver
the high levels of confidence that are
required for critical systems. Thus
analysis is needed to fill the gap.
Analysis might involve any of a variety of techniques, depending on the
kind of property being checked and
the level of confidence required. In the
last decade, dramatic advances have
been made in analyses that establish
properties of code fully automatically
through the use of theorem proving,
static analysis, model checking, and
model finding.
How well these techniques will work
and how widely they will be adopted
remain to be seen. But a number of
industrial successes demonstrate that
the approaches are at least feasible
and, in the right context, effective.
Microsoft, for example, now includes
a sophisticated verification component in its driver development toolkit;
3
Praxis has achieved extraordinarily low
defect rates using a variety of formal
methods;
16 and Airbus has used static
analysis to show the absence of low-level runtime errors in the A340 and A380
flight-control software.
7
Until these approaches are more
widely adopted, many development
teams will choose to rely instead on
manual code review. In any case, it is
important to realize that arguments
that are not mechanically checked are
likely to be flawed, so their credibility
must suffer and confidence in any dependability claims that rely on them
must be reduced accordingly.
The credibility of tools. Tools are
enormously valuable, but the glamour
of automation can sometimes overwhelm our better judgment. A symptom of this is our tendency to invest
terms used to describe tools with more
significance than their simple meaning. For example, inventors of program
analyses have long classified their
creations as “sound” or “unsound.” A
sound analysis establishes a property
with perfect reliability. That is, if the
analysis does not report a bug, then
there is no possible execution that can
violate the property. This notion helpfully distinguishes verifiers from bug
finders—a class of tools that are very
effective at catching defects, especially
in low-quality software, but that usually
cannot contribute evidence of dependability because they tend to be heuristic and therefore unsound.
But the assumption that sound tools
are inherently more credible is dangerous. Alex Aiken found that an unsound
tool uncovered errors in a codebase
that a prior analysis, using a sound tool,
had failed to catch. The much higher
volume of false alarms produced by
the sound tool overwhelmed its users
and made the real defects harder to
identify. In recent years, developers of
1
analysis tools have come to realize that
the inclusion of false positives is just
as problematic as the exclusion of true
positives and that more sophisticated
measures are needed.
Even if an analysis establishes a
property with complete assurance, the
question of whether the property itself
is sufficient still remains. For example,
eliminating arithmetic overflows and
array bounds errors from a program is
certainly progress. But knowing that
such faults are absent may not help
the dependability case unless there is
either: a chain of reasoning connecting this knowledge to assertions about
end-to-end properties; or some strong
statistical evidence that the absence of
these faults is correlated with the absence of other faults.
Among analysis tools, mathematical
proof is generally believed to offer the
highest level of confidence. An analysis
substantiated with a proof can be certified independently by examining the
proof in isolation, thereby mitigating
the concern that the tool that produced
the proof might have been faulty.
Proof is not foolproof, however.
When a bug was reported in his own
code (part of the Sun Java library), Joshua Bloch found4 that the binary search
algorithm—proved correct many years
before (by, amongst others Jon Bentley
in his Communications column) and
upon which a generation of programmers had relied—harbored a subtle
flaw. The problem arose when the sum
of the low and high bounds exceeded
the largest representable integer. Of
course, the proof wasn’t wrong in a technical sense; there was an assumption
that no integer overflow would occur
(which was reasonable when Bentley
wrote his column, given that computer memories back then were not large
enough to hold such a large array). In
practice, however, such assumptions
will always pose a risk, as they are often
hidden in the very tools we use to reason about systems and we may not be
aware of them until they are exposed.
closing thoughts
The central message of this article is that
it is not rational to believe that a software
system is dependable without good reason. Thus any approach that promises
to develop dependable software must
provide such reason. A clear and explicit
articulation is needed of what “
dependable” means for the system at hand, and
an argument must be made that takes
into account not only the correctness of
the code but also the behavior of all the
other components of the system, including human operators.
Is this approach practical? The cost
of constructing a dependability case,
after all, may be high. On the other
hand, such construction should focus
resources, from the very start of the
development, where they bring the
greatest return, and the effort invested
in obtaining a decoupled design may
reduce the cost of maintenance later.
The experience of Praxis shows that
many of the approaches that the indus-
APriL 2009 | voL. 52 | no. 4 | communicAtionS of the Acm
87