you could consider the model’s complexity: Is it simple enough to be examined all at once by a human?
Other work has investigated so-called post hoc interpretations. These
interpretations might explain predictions without elucidating the mechanisms by which models work. Examples include the verbal explanations
produced by people or the saliency
maps used to analyze deep neural networks. Thus, human decisions might
admit post hoc interpretability despite
the black-box nature of human brains,
revealing a contradiction between two
popular notions of interpretability.
Desiderata of
Interpretability Research
This section spells out the various
desiderata of interpretability research.
The demand for interpretability arises
when a mismatch occurs between the
formal objectives of supervised learning (test-set predictive performance)
and the real-world costs in a deployment setting.
Typically, evaluation metrics require only predictions and ground-truth labels. When stakeholders additionally demand interpretability,
you might infer the existence of objectives that cannot be captured in
this fashion. In other words, because
most common evaluation metrics for
supervised learning require only predictions, together with ground truth, to
produce a score, the very desire for an
interpretation suggests that sometimes predictions alone and metrics
calculated on them do not suffice to
characterize the model. You should
then ask, what are these other objectives and under what circumstances
are they sought?
Often, real-world objectives are difficult to encode as simple mathematical functions. Otherwise, they might
just be incorporated into the objective
function and the problem would be
considered solved. For example, an algorithm for making hiring decisions
should simultaneously optimize productivity, ethics, and legality. But how
would you go about writing a function that measures ethics or legality?
The problem can also arise when you
desire robustness to changes in the
dynamics between the training and
deployment environments.
of models? Or perhaps trust is a subjective concept?
Other authors suggest that an interpretable model is desirable because it
might help uncover causal structure in
observational data. 1 The legal notion of
a right to explanation offers yet another
lens on interpretability. Finally, sometimes the goal of interpretability might
simply be to get more useful information from the model.
While the discussed desiderata, or
objectives of interpretability, are diverse, they typically speak to situations
where standard ML problem formulations, for example, maximizing accuracy on a set of hold-out data for which
the training data is perfectly representative, are imperfectly matched to the
complex real-life tasks they are meant
to solve. Consider medical research
with longitudinal data. The real goal
may be to discover potentially causal
associations that can guide interventions, as with smoking and cancer. 29
The optimization objective for most
supervised learning models, however,
is simply to minimize error, a feat that
might be achieved in a purely correlative fashion.
Another example of such a mismatch is that available training data
imperfectly represents the likely deployment environment. Real environments often have changing dynamics.
Imagine training a product recommender for an online store, where new
products are periodically introduced,
and customer preferences can change
over time. In more extreme cases, actions from an ML-based system may
alter the environment, invalidating future predictions.
After addressing the desiderata of
interpretability, this article considers
which properties of models might
render them interpretable. Some papers equate interpretability with understandability or intelligibility, 16
(that is, you can grasp how the models
work). In these papers, understandable models are sometimes called
transparent, while incomprehensible
models are called black boxes. But
what constitutes transparency? You
might look to the algorithm itself:
Will it converge? Does it produce a
unique solution? Or you might look to
its parameters: Do you understand
what each represents? Alternatively,
What is trust?
Is it simply
confidence
that a model will
perform well?