Technical
Perspective
A Whitebox Solution for
Blackbox-Like Behaviors
By David G. Andersen
metric and its derivatives can enable
similar approaches in the DNN context.
I often tell students, when first starting to learn about research, that they
should keep an eye out for the papers in
an area that everyone else claims to have
beaten: Those are the papers that stimulated other researchers. DeepXplore
will be such a paper. Its specific metrics
and constraints on example generation
are unlikely to be the final word in DNN
testing, but the work that follows will exist because of researchers seeing these
ideas and trying to improve upon them.
The core framework from DeepXplore
will likely endure: Establish an effective
coverage metric based upon the numerical values obtained by the activations
of the neural network and use a constrained search procedure to maximize
coverage with respect to that metric.
David G. Andersen is a professor in the computer science
department at Carnegie Mellon University, Pittsburgh, PA,
USA, and is CTO of BrdgAI.
Copyright held by author.
DEEP NEURAL NETWORKS (DNNs) are rapidly becoming an indispensable part of
the computing toolbox, with particular
success in helping to bridge the messy
analog world into forms we can process
with more conventional computing techniques (image and speech recognition,
as some of the most obvious examples).
The price we pay, however, is inscrutability: DNNs behave like black boxes,
without clearly explainable logic for
their functioning. Admitting for the moment that most complex software systems are also approximately impossible
to fully reason about, we have—and continue to develop—methods for formally
reasoning about and extensively testing critical components. Almost nothing equivalent exists for DNNs. This is
particularly worrying precisely because
of the power of DNNs to allow us to extend computing into domains previously inaccessible. In at least one area
of medical diagnostics—identifying
diabetic retinopathy—DNN-based approaches already match expert human
performance, but we have little experience yet to help us understand what
kind of bugs those systems may fall prey
to when deployed in the real world.
DeepXplore brings a software testing
perspective to DNNs and, in doing so,
creates the opportunity for enormous
amounts of follow-on work in several
ways. Much of the prior work in finding errors in DNNs focused on finding
individual adversarial modifications of
images, but without the explicit focus
on a diversity of computational paths
taken by the DNN to achieve them. The
metric introduced in DeepXplore—
neuron coverage—is an analogue of
the code coverage metric traditionally
used in software testing. This metric
has utility beyond the techniques used
in DeepXplore; security bug hunting,
for example, has found coverage-guided
fuzzing to be a powerful and effective
technique, and the neuron coverage
To view the accompanying paper,
visit doi.acm.org/10.1145/3361566
research highlights
DOI: 10.1145/3361564
I often tell students
to keep an eye out
for the papers in
an area that everyone
else claims to have
beaten: Those are
the papers that
stimulated other
researchers.
DeepXplore will be
such a paper.
Advertise with ACM!
Reach the innovators
and thought leaders
working at the
cutting edge
of computing
and information
technology through
ACM’s magazines,
websites
and newsletters.
Request a media kit
with specifications
and pricing:
Ilia Rodriguez
+ 1 212-626-0686
acmmediasales@acm.org
◊◆◊◆◊