related work in Bayesian methods,
investigating case-based reasoning
approaches for interpreting generative models.
The concept of interpretability appears simultaneously important and
slippery. Earlier, this article analyzed
both the motivations for interpretability and some attempts by the research
community to confer it. Now let’s consider the implications of this analysis
and offer several takeaways.
˲ Linear models are not strictly more
interpretable than deep neural networks.
Despite this claim’s enduring popularity, its truth value depends on which
notion of interpretability is employed.
With respect to algorithmic transparency, this claim seems uncontroversial, but given high-dimensional or
heavily engineered features, linear
models lose simulatability or decomposability, respectively.
When choosing between linear and
deep models, you must often make a
tradeoff between algorithmic transparency and decomposability. This is because deep neural networks tend to operate on raw or lightly processed
features. So, if nothing else, the features
are intuitively meaningful, and post hoc
reasoning is sensible. To get comparable performance, however, linear models often must operate on heavily hand-engineered features. Lipton et al. 13
demonstrate such a case where linear
models can approach the performance
of recurrent neural networks (RNNs)
only at the cost of decomposability.
For some kinds of post hoc interpretation, deep neural networks exhibit a
clear advantage. They learn rich representations that can be visualized, verbalized, or used for clustering. Considering the desiderata for interpretability,
linear models appear to have a better
track record for studying the natural
world, but there seems to be no theoretical reason why this must be so.
Conceivably, post hoc interpretations
could prove useful in similar scenarios.
˲ Claims about interpretability must
be qualified. As demonstrated here,
the term interpretability does not ref-
erence a monolithic concept. To be
meaningful, any assertion regarding
interpretability should fix a specific
definition. If the model satisfies a form
you may get a very different saliency
map. This contrasts with linear mod-
els, which model global relationships
between inputs and outputs.
Another attempt at local explanations is made by Ribeiro et al. 23 In this
work, the authors explain the decisions
of any model in a local region near a
particular point by learning a separate
sparse linear model to explain the decisions of the first. Strangely, although
the method’s appeal over saliency
maps owes to its ability to provide explanations for non-differentiable models, it is more often used when the
model subject to interpretation is in
fact differentiable. In this case, what is
provided, besides a noisy estimate of
the gradient, remains unclear. In this
paper, the explanation is offered in
terms of a set of superpixels. Whether
or not this is more informative than a
plain gradient may depend strongly on
how one chooses the superpixels.
Moreover, absent a rigorously defined
objective, who is to say which hyper-parameters are correct?
Explanation by example. One post
hoc mechanism for explaining the decisions of a model might be to report
(in addition to predictions) which
other examples are most similar with
respect to the model, a method suggested by Caruana et al. 2 Training a
deep neural network or latent variable
model for a discriminative task provides access to not only predictions
but also the learned representations.
Then, for any example, in addition to
generating a prediction, you can use
the activations of the hidden layers to
identify the k-nearest neighbors based
on the proximity in the space learned
by the model. This sort of explanation
by example has precedent in how humans sometimes justify actions by
analogy. For example, doctors often
refer to case studies to support a
planned treatment protocol.
In the neural network literature,
Mikolov et al. 19 use such an approach to
examine the learned representations
of words after training the word2vec
model. Their model is trained for discriminative skip-gram prediction, to
examine which relationships the model has learned they enumerate nearest
neighbors of words based on distances calculated in the latent space. Kim
et al. 10 and Doshi-Velez et al. 5 have done
of the perturbed
inputs can give
clues to what