news
Social | DOI: 10.1145/1364782.1364788
in Search of
Dependable Design
How can software and hardware developers
increase the reliability of their designs?
IN 1994, an obscure circuitry
error was discovered in Intel’s Pentium I microprocessor. Thomas R. Nicely, a
mathematician then affiliated
with Lynchburg College in Virginia,
noticed that the chip gave incorrect answers to certain floating-point division
calculations. Other researchers soon
confirmed the problem and identified
additional examples. And though Intel
initially tried to downplay the mistake,
the company eventually responded
to mounting public pressure by
offering to replace each one of the
flawed processors.
“It was the first error to make the
evening news,” recalls Edmund Clarke
of Carnegie Mellon University. The cost
to the company: around $500 million.
Nearly 15 years later, the Pentium
bug continues to serve as a sobering reminder of how expensive design flaws
can be. The story is no different for software: a $170 million virtual case management system was scrapped by the
FBI in 2005 due to numerous failings,
and a flawed IRS tax-processing system
consumed billions of dollars in the late
1990s before it was finally fixed. And in
an era in which people rely on computers in practically every aspect of their
lives—in cars, cell phones, airplanes,
ATMs, and more—the cost of unreliable design is only getting higher. Data
is notoriously difficult to come by, but
a 2002 study conducted by the National
Institute of Standards and Technology
(NIST) estimated that faulty software
alone costs the U.S. economy as much
as $59.5 billion a year in lost information, squandered productivity, and increased repair and maintenance.
But it’s not just a matter of money—
increasingly, people’s lives are at stake.
Faulty software has plunged cockpit
displays into darkness, sunk oil rigs,
and caused missiles to malfunction.
“There have been only a few real disasters due to software. But we’re walking closer and closer to the edge,” says
MIT’s Daniel Jackson.
Experts agree that flaws typically
arise not from minor bugs in code,
but during the higher-level design process. (Security flaws, which tend to be
caused by implementation-level vul-nerabilities, are often an exception to
this rule.) One class of problems arises
at the requirements phase: program
design requirements are often poorly
articulated, or poorly understood. Another class arises from insufficient human factors design, where engineers
make unwarranted assumptions about
the environment in which software or
hardware will operate. If a program
isn’t capable of handling those unforeseen conditions, it may fail.
But mistakes can happen at any
time. “Since humans aren’t perfect,
humans make mistakes, and mistakes
can be made in any step of the development process,” cautions Gerard Holz-
mann of the NASA/JPL Laboratory for
Reliable Software.
Holzmann is among a small group
of researchers who are committed
to developing tools, techniques, and
procedures for increasing design reliability. Currently, most programs are
debugged and then refined by random
testing. Testing can be useful to pinpoint smaller errors, say researchers,
but inadequate when it comes to identifying structural ones. And tests designed for specific scenarios may not
be able to explore combinations of behavior that fall outside of anticipated
patterns. The search is therefore on for
additional strategies.
One promising technique is known
as model checking. The idea is to verify
the logic behind a particular software
or hardware design by constructing
a mathematical model and using an
algorithm to make sure it satisfies
certain requirements. Though the
task can be time consuming, it forces
developers to articulate their requirements in a systematic, mathematical
way, thereby minimizing ambiguity. More importantly, however, model
checkers automatically give diagnostic
counterexamples when mistakes are
found, helping developers pinpoint
what went wrong and catch flaws before they are coded.
“When people use the term ‘
reliability,’ they might have some probabilis-tic notion that ‘only rarely’ do errors
crop up, whereas people in the formal
verification community mean that all
behaviors are correct against all specified criteria,” explains Allen Emerson
of the University of Texas at Austin. (In
recognition of the importance of formal verification techniques, the 2007
ACM A.M. Turing Award was given to
Edmund Clarke, Allen Emerson, and
Joseph Sifakis for their pioneering
work in model checking. A Q&A with