As a simple, real-world example, consider going out to the middle of a large
deserted area, pointing a gun away
from oneself, and firing. If there is nobody or nothing in the vicinity, the gun
could be considered to be both reliable
and safe. Consider, however, doing the
same thing in a crowded mall. The gun
has not changed, the gun’s reliability
has not changed, and the action (
pulling the trigger) has not changed. But
the safety certainly has.
The accompanying sidebar highlights three examples out of hundreds
of similar losses. 4 Considering reliability only at the system level (instead of
the component level) does not help.
Complex systems almost always have
many requirements (or goals) while
there are constraints on how those
goals can be achieved. As an example,
a chemical plant may very reliably produce chemicals (the goal or mission of
the plant) while at the same time polluting the environment around the
plant. The plant may be highly reliable
in producing chemicals but not safe.
Most safety-critical systems have both
mission (non-safety) requirements and
safety constraints on how the mission
or goals can be achieved. A “system failure” or inability to satisfy its requirements is not equivalent to a hazard or
an accident. One exception is if safety
is the only goal of the system; however,
even for systems such as air traffic control, there are usually non-safety goals
such as optimizing throughput in addition to the safety goals.
A common approach to assess-
ing safety is to use probabilistic risk
assessment to assess the reliability
of the components and then to com-
bine these values to obtain the sys-
tem reliability. Besides the fact that
this assessment ignores accidents
that are caused by the interactions
of “unfailed” components (see Mis-
conception 3), most of these assess-
ments include only random hardware
failures and assume independence
between the failures. Therefore, they
provide anything close to a real safety
assessment when the systems are just
hardware and relatively simple. Such
systems existed 50+ years ago when
these probabilistic risk methods were
developed; virtually all systems today
(particularly complex ones) contain
non-stochastic components including
required control action to escape from
the stall, the driver does not see the
pedestrian and does not brake in time
to prevent a collision, the weapon con-
troller thinks that friendly troops are
the enemy and initiates friendly fire.
The pilot, driver, and weapon control-
ler can be human or computerized, or
a combination of both.
Accidents involving computers (and
humans) most often occur when their
models of the current state of the controller do not match the actual state of
the controlled process; the controller
issues a control action that is appropriate for a different state but not the one
that currently exists. As an example,
the software controller thinks the aircraft is in a stall when it is not and issues a control action to escape the nonexistent stall only to inadvertently put
the aircraft into a dangerous state.
Starting from this foundation, let’s
consider some of the most common
misconceptions with respect to software and safety.
Software Itself Can Be Unsafe
Software cannot catch on fire or explode; it is an abstraction. Only physical
entities can inflict damage to life and
property: physical energy is usually required to inflict physical harm. In the
figure in this column, software sends
control signals to a physical process,
which may have physical effects. Nuclear power plants can release radiation,
chemical plants can release toxins,
weapon systems can explode or inadvertently target a friendly object, for
exsmple. One old model of an accident
describes it as uncontrolled energy.
Software does not release energy; it
simply releases bits, which can be used
to send a control signal.
To avoid misconceptions that arise
from the term “software safety,” sometimes safety engineers speak of “
software system safety,” to denote the
contribution of software behavior to a
dangerous process. An alternative conception is to speak of the contribution
of software to system safety. Either way,
by considering software in isolation,
without including the controlled physical process, it is not possible to assure
anything about the safety of the system
the software is controlling.
The Ariane 4 software Inertial Reference System was perfectly safe in that
launcher. However, when reused in the
Ariane 5, it led to an explosion and loss
of a satellite. Many accidents involve
reused software. 3 It is not the software
that is unsafe, but the entire system
controlled by the software.
Reliable Systems Are Safe;
That Is, Reliability and Safety
Are Essentially the Same Thing.
Reliability Assessment Can
Therefore Act as a Proxy for Safety
Reliability and safety are different system properties and sometimes even
conflicting. This is true also with respect to the contribution of software to
accidents. System components (
including software) can operate 100% reliably
and accidents may still result, usually
from unsafe interactions among the
system components. In addition, the
larger environment (including social
policies and decision making) beyond
the system boundaries is important.
A cyber-human-physical control loop.
(Human and/or Computer)
Model (beliefs) about the state
of the controlled process Feedback