feature inversion framework is proposed to provide local explanations.
This framework inverts the representations at higher layers of CNN to a synthesized image, while simultaneously
encodes the location information of the
target object in a mask. Decomposition
is another perspective to take advantage of deep DNN representations. For
instance, through modeling the infor-mation-flowing process of the hidden
representation vectors in RNN models,
the RNN prediction is decomposed into
additive contribution of each word in
the input text.
12 The decomposition result could quantify the contribution of
each individual word to a RNN prediction. These two explanation paradigms
achieve promising results on a variety
of DNN architectures, indicating the
intermediate information indeed contributes significantly to the attribution.
Furthermore, deep representations
serve as a strong regularizer, increasing
the possibility the explanations faithfully characterize the behaviors of DNN
under normal operating conditions.
Thus, it reduces the risks of generating
surprising artifacts and leads to more
Interpretable machine learning has
numerous applications. We introduce
three representative ones: model validation, model debugging, and knowledge discovery.
Model validation. Explanations
could help examine whether a machine
learning model has employed the true
evidences instead of biases that widely
exist among training data. A post-hoc attribution approach, for instance, analyzes three question answering models.
The attribution heatmaps show these
models often ignore important parts
of the questions and rely on irrelevant
words to make decisions. They further
indicate the weakness of the models is
caused by the inadequacies of training
data. Possible solutions to fix this problem include modifying training data or
introducing inductive bias when training the model. More seriously, machine
learning models may rely on gender and
ethnic biases to make decisions.
9 Interpretability could be exploited to identify whether models have utilized these
biases to ensure models do not violate
ethical and legal requirements.
Model debugging. Explanations also
can be employed to debug and analyze the misbehavior of models when
models give wrong and unexpected
predictions. A representative example
is adversarial learning.
26 Recent work
demonstrated that machine learning
models, such as DNNs, can be guided
into making erroneous predictions
with high confidence, when processing accidentally or deliberately crafted
20, 26 However, these inputs are
quite easy to be recognized by humans.
In this case, explanation facilitates humans to identify the possible model de-ficiencies and analyze why these models
may fail. More importantly, we may further take advantage of human knowledge to figure out possible solutions to
promote the performances and reasonability of models.
Knowledge discovery. The derived explanations also allow humans to obtain
new insights from machine learning
model through comprehending their
decision-making process. With explanation, the area experts and the end
users could provide realistic feedbacks.
Eventually, new science and new knowledge, which are originally hidden in the
data, could be extracted. For instance,
a rule-based interpretable model has
been utilized to predict the mortality
risk for patients with pneumonia.
of the rules from the model suggests
having asthma could lower a patient’s
risk of dying from pneumonia. It turns
out to be true since patients with asthma
were given more aggressive treatments,
which led to better outcomes.
Despite recent progress in interpretable machine learning, there are
still some urgent challenges, especially on explanation method design
as well as evaluation.
Explanation method design. The
first challenge is related to the method
design, especially for post-hoc expla-
nation. We argue that an explanation
method should be restricted to truly
reflect the model behavior under nor-
mal operation conditions. This criteri-
on has two meanings. Firstly, the ex-
planations should be faithful to the
mechanism of the underlying machine
12 Post-hoc explana-
tion methods propose to approximate
the behavior of models. Sometimes,
the approximation is not sufficiently
accurate, and the explanation may fail
to precisely reflect the actual opera-
tion status of the original model. For
instance, an explanation method may
give an explanation that makes sense
to humans, while the machine learn-
ing model works in an entirely differ-
ent way. Second, even when explana-
tions are of high fidelity to the
underlying models, they may fail to
represent the model behavior under
normal conditions. Model explana-
tion and surprising artifacts are often
two sides of the same coin. The expla-
nation process could generate exam-
ples that are out of distribution from
the statistics in the training dataset,
including nonsensical inputs and adver-
16 which are beyond
the capability of current machine
learning models. Without careful
design, both global and local explana-
tions may trigger the artifacts of ma-
chine learning models, rather than
produce meaningful explanations.
Explanation method evaluation. The
second challenge involves the method
evaluation. We introduce below the
evaluation challenges for intrinsic explanation and post-hoc explanation.
The challenge for intrinsic explanation mainly lies in how to quantify the
interpretability. There are broad sets of
interpretable models designed according to distinct principles and have various forms of implementations. Take
the recommender system as an example, both interpretable latent topic
model and attention mechanism could
provide some extent of interpretability.
Nevertheless, how can we compare the
interpretability between globally interpretable model and locally interpretable model? There is still no consensus
on what interpretability means and
how to measure the interpretability.
Finale and Been propose three types
of metrics: application-grounded metrics, human-grounded metrics, and
functionally grounded metrics.
metrics are complementary to each
other and bring their own pros and
cons regarding the degree of validity
and the cost to perform evaluations.
Adopting what metrics heavily depends on the tasks so as to make more
For post-hoc explanation, comparing to evaluate its interpretability, it is