even from a theoretical point of view,
straightforward testing can likewise be
a challenge from a practical point of
view. Suppose a researcher proposes a
new defense procedure and evaluates
that defense against a particular adversarial example attack procedure. If the
resulting model obtains high accuracy,
does it mean the defense was effective?
Possibly, but it could also mean the
researcher’s implementation of the
attack was weak. A similar problem occurs when researchers test a proposed
attack technique against their own
implementation of a common defense
procedure.
To resolve these difficulties, we created the CleverHans19 library with
reference implementations of several
attack and defense procedures. Researchers and product developers can
use CleverHans to test their models
against standardized, state-of-the-art
attacks and defenses. This way, if a
defense obtains high accuracy against
a CleverHans attack, the test would
show that the defense overcomes this
standard implementation of the attack,
and if an attack obtains a high failure
rate against a CleverHans defense, the
test would show that the attack is able
to defeat a rigorous implementation of
the defense. While such standardized
testing of attacks and defenses does
not substitute in any way to rigorous
verification, it does provide a common
benchmark. Moreover, results in published research are comparable to one
another, so long as they are produced
with the same version of CleverHans
in similar computing environments.
Future of Adversarial
Machine Learning
Adversarial machine learning is at a
turning point. In the context of adversarial inputs at test time, we have
several effective attack algorithms but
few strong countermeasures. Can we
expect this situation to continue indefinitely? Can we expect an arms race
with attackers and defenders repeatedly seizing the upper hand in turn? Or
can we expect the defender to eventually gain a fundamental advantage?
We can explain adversarial ex-
amples in current machine learning
models as the result of unreasonably
linear extrapolation9 but do not know
what will happen when we fix this
adversarial examples. The natural way
to test robustness to adversarial exam-
ples is simply to evaluate the accuracy
of the model on a test set that has been
adversarially perturbed to create adver-
sarial examples. 30
Unfortunately, testing is insufficient
for providing security guarantees, as
an attacker can send inputs that differ from the inputs used for the testing
process. For example, a model that is
tested and found to be robust against
the fast gradient sign method of adversarial example generation9 may be vulnerable to computationally expensive
methods like attacks based on L-BFGS. 30
In general, testing is insufficient
because it provides a “lower bound”
on the failure rate of the system when
an “upper bound” is necessary for providing security guarantees. Testing
identifies n inputs that cause failure,
so the engineer can conclude that at
least n inputs cause failure; the engineer would prefer to have a means of
becoming reasonably confident that at
most n inputs cause failure.
Putting this in terms of security, a
defense should provide a measurable
guarantee that characterizes the space
of inputs that cause failures. Conversely,
the common practice of testing can only
provide instances that cause error and is
thus of limited value in understanding
the robustness of a machine learning
system. Development of an input-char-acterizing guarantee is central to the
future of machine learning in adversarial settings and will almost certainly be
grounded in formal verification.
Theoretical verification of machine
learning. Verification of machine learn-
ing model robustness to adversarial
examples is in its infancy. Current ap-
proaches verify that a classifier assigns
the same class to all points within a spec-
ified neighborhood of a point x. Huang
et al. 13 developed the first verification
system for demonstrating that the out-
put class is constant across a desired
neighborhood. This first system uses an
SMT solver. Its scalability is limited, and
scaling to large models requires mak-
ing strong assumptions (such as that
only a subset of the units of the network
is relevant to the classification of a par-
ticular input point). Such assumptions
mean the system can no longer provide
an absolute guarantee of the absence of
adversarial examples, as an adversarial
example that violates the assumptions
could evade detection. Reluplex15 is an-
other verification system that uses linear
programming solvers to scale to much
larger networks. Reluplex is able to be-
come much more efficient by specializ-
ing on rectified linear networks6, 14, 18 and
their piecewise linear structure.
These current verification systems
are limited in scope because they verify
only that the output class remains con-
stant in some specified neighborhood
of some specific point x. It is infeasible
for a defender to fully anticipate all fu-
ture attacks when specifying the neigh-
borhood surrounding x to verify. For
instance, a defender may use a verifica-
tion system to prove there are no adver-
sarial examples within a max-norm ball
of radius ∈, but then an attacker may
devise a new way of modifying x that
should leave the class unchanged yet
has a high max-norm. An even greater
challenge is verifying the behavior of
the system near new test points x′.
In a traditional machine learning
setting there are clear theoretical limits
as to how well a machine learning sys-
tem can be expected to perform on new
test points. For example, the “no free
lunch theorem” 34 states that all super-
vised classification algorithms have the
same accuracy on new test points when
averaged over all possible datasets.
One important open theoretical
question is: Can the no-free-lunch theo-
rem be extended to the adversarial set-
ting? If we assume attackers operate
by making small perturbations to the
test set, then the premise of the no-
free-lunch theorem, where the aver-
age is taken over all possible datasets,
including those with small perturba-
tions, should not be ignored by the
classifier, no longer applies. Depend-
ing on the resolution of this question,
the arms race between attackers and
defenders could have two different
outcomes. The attacker might funda-
mentally have the advantage due to
inherent statistical difficulties associ-
ated with predicting the correct value
for new test points. If we are fortunate,
the defender might have a fundamen-
tal advantage for a broad set of prob-
lem classes, paving the way for the
design and verification of algorithms
with robustness guarantees.
Reproducible testing with
CleverHans. While verification is a challenge