dition, scientists use machine learning
to get insight from big data. Medicine
offers several examples.
the behavior of AlphaGo35 has revolutionized human understanding of the
game. Intelligible models greatly facilitate these processes.
Legal imperatives. The European
Union’s GDPR legislation decrees citizens’ right to an explanation, and other nations may follow. Furthermore,
assessing legal liability is a growing
area of concern; a deployed model (for
example, self-driving cars) may introduce new areas of liability by causing
accidents unexpected from a human
operator, shown as “AI-specific error”
in Figure 3. Auditing such situations to
assess liability requires understanding
the model’s decisions.
So far we have treated intelligibility
informally. Indeed, few computing researchers have tried to formally define
what makes an AI system interpretable, transparent, or intelligible,
one suggested criterion is human simulatability:
25 Can a human user easily
predict the model’s output for a given
input? By this definition, sparse linear
models are more interpretable than
dense or non-linear ones.
Philosophers, such as Hempel and
Salmon, have long debated the nature
of explanation. Lewis23 summarizes:
“To explain an event is to provide some
information about its causal history.”
But many causal explanations may ex-
ist. The fact that event C causes E is
best understood relative to an imag-
ined counterfactual scenario, where
absent C, E would not have occurred;
furthermore, C should be minimal,
an intuition known to early scientists,
such as William of Occam, and formal-
ized by Halpern and Pearl.
Following this logic, we suggest a
better criterion than simulatability is
the ability to answer counterfactuals,
aka “what-if” questions. Specifically,
we say that a model is intelligible to
the degree that a human user can pre-
dict how a change to a feature, for ex-
ample, a small increase to its value,
will change the model’s output and if
they can reliably modify that response
curve. Note that if one can simulate the
model, predicting its output, then one
can predict the effect of a change, but
not vice versa.
Linear models are especially inter-
pretable under this definition because
they allow the answering of counter-
factuals. For example, consider a naive
Bayes unigram model for sentiment
analysis, whose objective is to predict
the emotional polarity (positive or
negative) of a textual passage. Even if
the model were large, combining evi-
dence from the presence of thousands
of words, one could see the effect of
a given word by looking at the sign
and magnitude of the corresponding
weight. This answers the question,
“What if the word had been omitted?”
Similarly, by comparing the weights
associated with two words, one could
predict the effect on the model of sub-
stituting one for the other.
Ranking intelligible models. Since
one may have a choice of intelligible
models, it is useful to consider what
makes one preferable to another. So-
cial science research suggests an ex-
planation is best considered a social
process, a conversation between ex-
plainer and explainee.
15, 30 As a result,
Grice’s rules for cooperative communi-
cation10 may hold for intelligible expla-
nations. Grice’s maxim of quality says
be truthful, only relating things that
are supported by evidence. The maxim
of quantity says to give as much in-
formation as is needed, and no more.
The maxim of relation: only say things
that are relevant to the discussion. The
maxim of manner says to avoid ambi-
guity, being as clear as possible.
Miller summarizes decades of work
by psychological research, noting that
explanations are contrastive, that is,
of the form “Why P rather than Q?”
The event in question, P, is termed the
fact and Q is called the foil.
30 Often the
foil is not explicitly stated even though
it is crucially important to the explanation process. For example, consider the question, “Why did you predict
the image depicts an indigo bunting?” An explanation that points to
the color blue implicitly assumes the
foil is another bird, such as a chickadee. But perhaps the questioner wonders why the recognizer did not predict a pair of denim pants; in this case
a more precise explanation might
highlight the presence of wings and a
beak. Clearly, an explanation targeted
to the wrong foil will be unsatisfying,
but the nature and sophistication of a
foil can depend on the end user’s expertise; hence, the ideal explanation
will differ for different people.
6 For example, to verify that an ML system is
fair, an ethicist might generate more
Figure 4. A part of Figure 1 from Caruana et al.
4 showing three (of 56 total) components for a GA2M model, which was trained to predict a
patient’s risk of dying from pneumonia.
The two line graphs depict the contribution of individual features to risk: patient’s age, and Boolean variable asthma.
The y-axis denotes its contribution (log odds) to predicted risk. The heat map visualizes the contribution due to
pairwise interactions between age and cancer rate.
(a) Age (b) Asthma (c) Age vs. Cancer
– 1 –0.5 0 0.5 1 – 1 –0.5 0 0.5 1