32 The activation vectors of an
active capsule can represent various
semantic-aware concepts like position
and pose of a particular object. This
nice property makes capsule network
more comprehensible for humans.
However, there are often trade-offs
between prediction accuracy and interpretability when constraints are directly incorporated into models. The more
interpretable models may result in reduced prediction accuracy comparing
the less interpretable ones.
Interpretable model extraction. An
alternative is to apply interpretable
model extraction, also referred as
36 which may not have
to sacrifice the model performance too
much. The motivation behind mimic
learning is to approximate a complex
model using an easily interpretable
model such as a decision tree, rule-based model, or linear model. As long
as the approximation is sufficiently
close, the statistical properties of the
complex model will be reflected in the
interpretable model. Eventually, we
obtain a model with comparable prediction performance, and the behavior
of which is much easier to understand.
For instance, the tree ensemble model
is transformed into a single decision
36 Moreover, a DNN is utilized to
train a decision tree that mimics the
input-output function captured by
the neural network so the knowledge
encoded in DNN is transferred to the
5 To avoid the overfitting
of the decision tree, active learning is
applied for training. These techniques
convert the original model to a decision tree with better interpretability
and maintain comparable predictive
performance at the same time.
Locally interpretable models are
usually achieved by designing more justified model architectures that could
explain why a specific decision is made.
Different from the globally interpretable models that offer a certain extent
of transparency about what is going on
inside a model, locally interpretable
models provide users understandable
rationale for a specific prediction.
A representative scheme is employing attention mechanism,
4, 38 which is
widely utilized to explain predictions
made by sequential models, for example, recurrent neural networks (RNNs).
Attention mechanism is advantageous
in that it gives users the ability to inter-
pret which parts of the input are attend-
ed by the model through visualizing the
attention weight matrix for individual
predictions. Attention mechanism has
been used to solve the problem of gen-
erating image caption.
38 In this case, a
CNN is adopted to encode an input im-
age to a vector, and an RNN with atten-
tion mechanisms is utilized to generate
descriptions. When generating each
word, the model changes its attention
to reflect the relevant parts of the im-
age. The final visualization of the atten-
tion weights could tell human what the
model is looking at when generating a
word. Similarly, attention mechanism
has been incorporated in machine
4 At decoding stage, the
neural attention module added to neu-
ral machine translation (NMT) model
assigns different weights to the hidden
states of the decoder, which allows the
decoder to selectively focus on different
parts of the input sentence at each step
of the output generation. Through visu-
alizing the attention scores, users could
understand how words in one language
depend on words in another language
for correct translation.
Post-Hoc Global Explanation
Machine learning models automatically learn useful patterns from a huge
amount of training data and retain the
learned knowledge into model structures and parameters.
Post-hoc global explanation aims to
provide a global understanding about
what knowledge has been acquired by
these pretrained models and illuminate the parameters or learned representations in an intuitive manner to humans. We classify existing models into
two categories: traditional machine
learning and deep learning pipelines
(see Figure 2), since we are capable of
extracting some similar explanation
paradigms from each category. Here,
we introduce how to provide explanation for these two types of pipelines.
Traditional machine learning explanation. Traditional machine learning
pipelines mostly rely on feature engineering, which transforms raw data
into features that better represent the
predictive task, as shown in Figure 2.
The features are generally interpretable
and the role of machine learning is to
map the representation to output. We
consider a simple yet effective explana-
tion measure that is applicable to most
of the models belonging to traditional
pipeline, called feature importance,
which indicates statistical contribution
of each feature to the underlying model
when making decisions.
Model-agnostic explanation. The
model-agnostic feature importance is
broadly applicable to various machine
learning models. It treats a model as a
black-box and does not inspect inter-
nal model parameters.
A representative approach is “per-
mutation feature importance.”
key idea is the importance of a specific
feature to the overall performance of a
model can be determined by calculat-
ing how the model prediction accuracy
deviates after permuting the values of
that feature. More specifically, given a
pretrained model with n features and a
test set, the average prediction score of
the model on the test set is p, which is
also the baseline accuracy. We shuffle
the values of a feature on the test set and
compute the average prediction score of
the model on the modified dataset.
This process is iteratively performed
for each feature and eventually n predic-
tion scores are obtained for n features
respectively. We then rank the impor-
tance of the n features according to the
reductions of their score comparing to
baseline accuracy p. There are several
advantages for this approach. First, we
do not need to normalize the values of
the handcrafted features. Second, it can
be generalized to nearly any machine
learning models with handcrafted fea-
tures as input. Third, this strategy has
been proved to be robust and efficient
in terms of implementation.
Model-specific explanation. There
also exists explanation methods specifically designed for different models.
Model-specific methods usually derive explanations by examining internal model structures and parameters.
Here, we introduce how to provide feature importance for two families of machine learning models.
Generalized linear models (GLM) is
constituted of a series of models that
are linear combination of input features and model parameters followed
by feeding to some transformation
function (often nonlinear).
of GLM include linear regression and
logistic regression. The weights of a