tive contribution decreases risk. For
example, Figure 4a shows how the patient’s age affects predicted risk. While
the risk is low and steady for young patients (for example, age < 20), it increases rapidly for older patients (age > 67).
Interestingly, the model shows a sudden increase at age 86; perhaps a result
of less aggressive care by doctors for
patients “whose time has come.” Even
more surprising is the sudden drop for
patients over 100. This might be another social effect; once a patient reaches
the magic “100,” he or she gets more
aggressive care. One benefit of an interpretable model is its ability to highlight
these issues, spurring deeper analysis.
Figure 4b illustrates another surprising aspect of the learned model; apparently, a history of asthma, a respiratory
disease, decreases the patients risk of
dying from pneumonia! This finding is
counterintuitive to any physician, who
recognizes that asthma, in fact, should
in theory increase such risk. When Caruana et al. checked the data, they concluded the lower risk was likely due to
correlated variables—asthma patients
typically receive timely and aggressive
therapy for lung issues. Therefore, although the model was highly accurate
on the test set, it would likely fail, dramatically underestimating the risk to a
patient with asthma who had not been
previously treated for the disease.
Facilitating human control of GA2M
models. A domain expert can fix such
erroneous patterns learned by the
model by setting the weight of the
asthma term to zero. In fact, GA2Ms let
users provide much more comprehensive feedback to the model by using a
GUI to redraw a line graph for model
terms.
4 An alternative remedy might
be to introduce a new feature to the
model, representing whether the patient had been recently seen by a pul-monologist. After adding this feature,
which is highly correlated with asthma, and retraining, the newly learned
model would likely reflect that asthma
(by itself) increases the risk of dying
from pneumonia.
There are two more takeaways from
this anecdote. First, the absence of an
important feature in the data represen-
tation can cause any AI system to learn
unintuitive behavior for another, corre-
lated feature. Second, if the learner is in-
telligible, then this unintuitive behavior
complex foils than a data scientist.
Most ML explanation systems have re-
stricted their attention to elucidating
the behavior of a binary classifier, that
is, where there is only one possible foil
choice. However, as we seek to explain
multiclass systems, addressing this is-
sue becomes essential.
Many systems are simply too complex to understand without approximation. Here, the key challenge is
deciding which details to omit. After
many years of study, psychologists determined that several criteria can be
prioritized for inclusion in an explanation: necessary causes (vs. sufficient
ones); intentional actions (vs. those
taken without deliberation); proximal
causes (vs. distant ones); details that
distinguish between fact and foil; and
abnormal features.
30
According to Lombrozo, humans
prefer explanations that are simpler
(that is, contain fewer clauses), more
general, and coherent (that is, consistent with what the human’s prior beliefs).
26 In particular, she observed the
surprising result that humans preferred simple (one clause) explanations to conjunctive ones, even when
the probability of the latter was higher than the former.
26 These results
raise interesting questions about the
purpose of explanations in an AI system. Is an explanation’s primary purpose to convince a human to accept
the computer’s conclusions (perhaps
by presenting a simple, plausible,
but unlikely explanation) or is it to
educate the human about the most
likely true situation? Tversky, Kahneman, and other psychologists have
documented many cognitive biases
that lead humans to incorrect conclusions; for example, people reason
incorrectly about the probability of
conjunctions, with a concrete and vivid scenario deemed more likely than
an abstract one that strictly subsumes
it.
16 Should an explanation system exploit human limitations or seek to
protect us from them?
Other studies raise an additional
complication about how to communi-
cate a system’s uncertain predictions
to human users. Koehler found that
simply presenting an explanation for
a proposition makes people think that
it is more likely to be true.
18 Further-
more, explaining a fact in the same way
as previous facts have been explained
amplifies this effect.
36
Inherently Intelligible Models
Several AI systems are inherently intel-
ligible, and we previously observed that
linear models support counterfactual
reasoning. Unfortunately, linear models
have limited utility because they often
result in poor accuracy. More expres-
sive choices may include simple deci-
sion trees and compact decision lists.
To concretely illustrate the benefits of
intelligibility, we focus on Generalized
additive models (GAMs), which are a
powerful class of ML models that relate
a set of features to the target using a lin-
ear combination of (potentially nonlin-
ear) single-feature models called shape
functions.
27 For example, if y repre-
sents the target and {x1, . . . .xn} repre-
sents the features, then a GAM model
takes the form y = β0 + ∑jfj (xj), where the
fis denote shape functions and the tar-
get y is computed by summing single-
feature terms. Popular shape functions
include non-linear functions such as
splines and decision trees. With linear
shape functions GAMs reduce to a lin-
ear models. GA2M models extend GAM
models by including terms for pairwise
interactions between features:
Caruana et al. observed that for do-
mains containing a moderate number
of semantic features, GA2M models
achieve performance that is competitive
with inscrutable models, such as ran-
dom forests and neural networks, while
remaining intelligible.
4 Lou et al. ob-
served that among methods available for
learning GA2M models, the version with
bagged shallow regression tree shape
functions learned via gradient boosting
achieves the highest accuracy.
27
Both GAM and GA2M are considered interpretable because the model’s
learned behavior can be easily understood by examining or visualizing the
contribution of terms (individual or
pairs of features) to the final prediction.
For example, Figure 4 depicts a GA2M
model trained to predict a patient’s risk
of dying due to pneumonia, showing
the contribution (log odds) to total risk
for a subset of terms. A positive contribution increases risk, whereas a nega-