Wyoming, and colleagues developed
what they referre3d to as “evolved
images.” Some were regular patterns
with added noise; others looked like
the static from an analog TV broadcast. Both were just abstract images
to humans, but these evolved images
would be classified by DNNs trained
on conventional photographs as
cheetahs, armadillos, motorcycles,
and whatever else the system had
been trained to recognize.
A 2017 attack centered on road-sign recognition demonstrated the
potential vulnerability of self-driving
cars and other robotic systems to attacks that took advantage of this unexpected property. In an experiment,
a team from the University of Michigan, Ann Arbor demonstrated that
brightly colored stickers attached to
a stop sign could make a DNN register it as a 45mph speed-limit sign.
This and similar attacks encouraged
the U.S. Defense Advanced Research
Projects Agency (DARPA) to launch a
project at the beginning of this year
to try to develop practical defenses
against thesse attacks.
The key issue that has troubled
deep learning researchers is why deep
learning models seem to be fooled by
what appears to humans like noise.
Although experiments by James Di-Carlo, a professor in neuroscience
working at the Massachusetts Institute of Technology (MIT), and others
have showed similarities between the
gross structure of the visual cortexes
of primates and DNNs, it has become
clear the machine learning models
make decisions based on information the brain either does not perceive, or simply ignores.
In work published earlier this year,
the student-run Labsix group based
at MIT found features recognized by
DNNs can be classified into groups
they call robust and non-robust.
Andrew Ilyas, Ph.D. student and
Labsix member, says robust features
are those that continue to deliver the
correct results when the pixels they
cover are changed by small amounts,
as in Szegedy’s experiments. “For in-
stance, even if you perturb a ‘floppy
ear’ by a small pixel-wise perturbation,
it is still indicative of the ‘dog’ class.”
Non-robust features, on the other
hand, may be textures or fine details
that can be disguised by lots of tiny
changes to pixel intensity or color.
“Imagine that there is a pattern that
gives away the true class, but is very
faint,” Ilyas suggests. It does not take
much to hide it or change it to resemble a non-robust feature from a completely different class.
In work similar to that of the
Labsix group, Haohan Wang and
colleagues at Carnegie-Mellon Uni-
versity found that filtering out high-
frequency information from images
worsened the performance of DNNs
they tested. Ilyas stresses that the
work his group performed demon-
strated that the subtle features are
useful and representative, but they
are easy to subvert, underlining, he
says, “a fundamental misalignment
between humans and machine-
learning models.”
Researchers have proposed a
battery of methods to try to de-
fend against adversarial examples.
Many have focused on the tendency
of DNNs to home in on the more
noise-like, non-robust features.
However, attacks are not limited to
those features, as various attempts
at countermeasures have shown.
In one case, a team of researchers
working at the University of Mary-
land used a generative adversarial
network (GAN) similar to those
used to synthesize convincing pic-
tures of celebrities. This GAN rebuilt
source images without the high-
frequency noise associated with
most adversarial examples and was,
The most resilient approach so far
is that of adversarial training. This
technique provides the DNN with a
series of examples during the train-
ing phase that try to force the model
to ignore features that are shown to
be vulnerable. It is a technique that
comes with a cost: experiments have
revealed that such training can just
as easily hurt the performance of
the DNN on normal test images; net-
works begin to lose their ability to
generalize and classify new images
correctly. They start overfitting to the
training data.
“When training our model with
adversarial training, we explicitly dis-
courage it from relying on non-robust
features. Thus, we are forcing it to
ignore information in the input that
would be useful for classification,”
Ilyas notes. “One could argue, how-
ever, that the loss in accuracy is not
necessarily a bad thing.”
Ilyas points out that the lower accu-
racy based on robust models is prob-
ably a more realistic estimate of a ma-
chine learning model’s performance
if we are expecting DNNs to recognize
images in the same way humans do.
Ilyas says one aim of the Labsix work
is to close the gap between human
and machine by forcing the DNNs to
home in on larger features. This will
have the effect of making it easier for
humans to interpret why the models
make the mistakes they do.
However, with conventional DNN
architectures, there is still some way
to go to close the gap with humans,
even if non-robust features are re-
moved from the process. A team led
by Jörn-Henrik Jacobsen, a post-
doctoral researcher at the Vector In-
stitute in Toronto, Canada, found it is
possible for completely different imag-
es to lead to the same prediction. Not
only that, adversarially trained DNNs
that focus on robust features seem to
be more susceptible to this problem.
A statistical analysis performed
by Oscar Deniz, associate profes-
sor at the Universidad de Castilla-La
Mancha in Spain, and his colleagues
suggests a deeper issue with machine
Adversarial training
provides deep neural
networks with
a series of examples
that try to force
the model to ignore
features that have
been shown
to be vulnerable.