Figure 3. Heatmap of DNN-specific local explanation. (a) original input, (b) back-propagation, (c) mask perturbation, and (d) investigation of representations.
Figure 4. Progress of interpretable ML.
(a) Input (b) Gradient (c) Perturbation (d) Representation
are integrated into a unified framework
where all methods are reformulated as a
modified gradient function.
2 This unifi-
cation enables comprehensive compari-
son between different methods and fa-
cilitates effective implementation under
modern deep-learning libraries, such as
TensorFlow and PyTorch. Back-propa-
gation-based methods are efficient in
terms of implementation, as they usu-
ally need a few forward and backward
calculations. On the other hand, they
are limited in their heuristic nature and
may generate explanations of unsat-
isfactory quality, which are noisy and
highlight some irrelevant features, as
shown in Figure 3b.
Mask perturbation. The previously
mentioned model-agnostic perturba-
tion could be computationally very
expensive when handling an instance
with high dimensions, since they need
to sequentially perturb the input. In
contrast, DNN-specific perturbation
could be implemented efficiently
through mask perturbation and gradi-
ent descent optimization. One repre-
sentative work formulates the pertur-
bation in an optimization framework
to learn a perturbation mask, which
explicitly preserves the contribution
values of each feature.
13 Note that this
framework generally needs to impose
various regularizations to the mask to
produce meaningful explanation rath-
er than surprising artifacts.
the optimization-based framework
has drastically boosted the efficiency,
generating an explanation still needs
hundreds of forward and backward op-
erations. To enable more computation-
ally efficient implementation, a DNN
model can be trained to predict the at-
8 Once the mask neural
network model is obtained, it only re-
quires a single forward pass to yield at-
tribution scores for an input.
Investigation of deep representa-
tions. Either perturbation or back-
propagation-based explanations ig-
nore the deep representations of the
DNN that might contain rich informa-
tion for interpretation. To bridge the
gap, some studies explicitly utilize the
deep representations of the input to
Based on the observation that deep
CNN representations capture the high-
level content of input images as well
as their spatial arrangement, a guided
Current stage is researcher-oriented explanations. We can make it more faithful
and accurate, which can be further utilized to promote model generalization ability,
and then develop user-friendly explanations.
Interpretable Machine Learning
More faithful and accurate Researcher-oriented
Difficulty of Tasks
Improve ML generalization
its prediction? Thus, the results may
be called counterfactual explanations.
The perturbation is performed across
features sequentially to determine their
contributions and can be implemented
in two ways: omission and occlusion. For
omission, a feature is directly removed
from the input, but this might be impractical since few models allow setting
features as unknown. As for occlusion,
the feature is replaced with a reference
value, such as zero for word embeddings
or specific gray value for image pixels.
Nevertheless, occlusion raises a new
concern that new evidence may be introduced and that can be used by the model as a side effect.
8 For instance, if we
occlude part of an image using a green
color and then we may provide undesirable evidence for the grass class. Thus,
we should be particularly cautious when
selecting reference values to avoid introducing extra pieces of evidence.
Model-specific explanation. There
are also explanation approaches ex-
clusively designed for a specific type of
model. Here, we introduce DNN-spe-
cific methods that treat the networks
as white boxes and explicitly utilize the
interior structure to derive explana-
tions. We divide them into three major
categories: back-propagation based
methods in a top-down manner; per-
turbation-based methods in a bottom-
up manner; and investigation of deep
representations in intermediate layers.
calculate the gradient, or its variants, of
a particular output with respect to the
input using back-propagation to derive
the contribution of features. In the simplest case, we can back-propagate the
33 The underlying hypothesis
is that larger gradient magnitude represents a more substantial relevance of a
feature to a prediction. Other approaches back-propagate different forms of
signals to the input, such as discarding
negative gradient values at the back-propagation process,
34 or back-propagating the relevance of the final prediction
score to the input layer.
3 These methods