state components that were not tested
and perhaps not even considered during design.
Nevertheless, most certification regimes still rely primarily on testing.
Developers and certifiers sometimes
talk self-assuredly about achieving “five
nines” of dependability, meaning that
the system is expected to survive 100,000
commands or hours before failing. The
mismatch between such claims and the
reality of software failures led one procurer to quip “It’s amazing how quickly
105 hours comes around.”
A Direct Approach
The direct approach, by definition, is
straightforward. The desired dependability goal is explicitly articulated as a
collection of claims that the system has
some critical properties. An argument,
or dependability case, is constructed
that substantiates the claims. The remainder of this article develops these
notions and outlines some of their
implications, but first we turn to the
fundamental questions of what constitutes a system and what it means for
the system to be dependable.
What is a system? An engineered
product that is introduced to solve a
particular problem and that consists
of software, the hardware platform on
which the software runs, the peripheral devices through which the product
interacts with the environment, and
any other components that contribute
to achieving the product’s goals (
including human operators and users)
is considered a system. In many cases,
the system’s designers must assume
that its operators behave in a certain
way. An air traffic management system, for example, cannot prevent a
midair collision if a pilot is determined to hit another aircraft; eliminating this assumption would require
a separation of aircraft that would not
be economically feasible. When a system’s dependability is contingent on
assumptions about its operators, they
should be viewed as a component of
the system and the design of operating procedures regarded as an essential part of the overall design.
What does “dependable” mean? A system is dependable if can be depended
on—that is, trusted—to perform a particular task. As noted earlier, such trust
is only rational when evidence of the
As in all engineering
enterprises,
dependability is a
trade-off between
benefits and risks,
with the level of
assurance (and the
quality and cost of
the evidence) being
chosen to match
the risk at hand.
system’s ability to act without exhibiting certain failures has been assessed.
So a system cannot be dependable without evidence, and dependability is thus
not merely the absence of defects or
the failures that may result from them
but the presence of concrete information suggesting that such failures will
not occur.
As in all engineering enterprises,
dependability is a trade-off between
benefits and risks, with the level of assurance (and the quality and cost of
the evidence) being chosen to match
the risk at hand. Our society is not willing to tolerate the failure of a nuclear
power plant, air traffic control center,
or energy distribution network, so
for such systems we will be willing to
absorb larger development and certification costs. Criticality depends, of
course, on the context of use. A spreadsheet program becomes critical if it is
used, say, for calculating radiotherapy
doses. And there are systems, such as
GPS satellites and cellphone networks,
on which so many applications depend
that widespread failure could be catastrophic.
Dependability is not a metric that
can be measured on a simple numeric
scale, because different kinds of failures have very different consequences.
The cost of preventing all failures will
usually be prohibitive, so a dependable
system will not offer uniform levels of
confidence across all functions. In fact,
a large variance is likely to be a characteristic of a dependable system. Thus
a dependable radiotherapy system
may become unavailable but cannot
be allowed to overdose a patient; a dependable e-commerce site may display
advertisements incorrectly, give bad
search results, and perhaps lose shop-ping-cart items over time, but it must
never bill the wrong amount or leak
customers’ credit card details; a dependable file synchronizer may report
spurious conflicts but should never silently overwrite newer versions of files.
Together, these considerations imply that the first steps in developing a
dependable system involve drawing
its boundaries—deciding which components in addition to the software,
physical and human, will be relied on;
identifying the critical properties; and
determining what level of confidence
is required.
APriL 2009 | voL. 52 | no. 4 | communicAtionS of the Acm
81