Trust. Some authors suggest interpretability is a prerequisite for trust. 9, 23
Again, what is trust? Is it simply confidence that a model will perform well?
If so, a sufficiently accurate model
should be demonstrably trustworthy,
and interpretability would serve no
purpose. Trust might also be defined
subjectively. For example, a person
might feel more at ease with a well-understood model, even if this understanding serves no obvious purpose.
Alternatively, when the training and
deployment objectives diverge, trust
might denote confidence that the model will perform well with respect to the
real objectives and scenarios.
For example, consider the growing
use of ML models to forecast crime
rates for purposes of allocating police
officers. The model may be trusted to
make accurate predictions but not to
account for racial biases in the training
data or for the model’s own effect in
perpetuating a cycle of incarceration
by over-policing some neighborhoods.
Another sense in which an end user
might be said to trust an ML model
might be if they are comfortable with
relinquishing control to it. Through
this lens, you might care not only about
how often a model is right, but also for
which examples it is right. If the model
tends to make mistakes on only those
kinds of inputs where humans also
make mistakes, and thus is typically accurate whenever humans are accurate,
then you might trust the model owing
to the absence of any expected cost of
relinquishing control. If a model tends
to make mistakes for inputs that humans classify accurately, however,
then there may always be an advantage
to maintaining human supervision of
the algorithms.
Causality. Although supervised
learning models are only optimized
directly to make associations, researchers often use them in the hope
of inferring properties of the natural
world. For example, a simple regression model might reveal a strong association between thalidomide use
and birth defects, or between smoking
and lung cancer. 29
The associations learned by supervised learning algorithms are not guaranteed to reflect causal relationships.
There could always be unobserved
causes responsible for both associated
variables. You might hope, however,
that by interpreting supervised learn-
ing models, you could generate hy-
potheses that scientists could then
test. For example, Liu et al. 14 empha-
size regression trees and Bayesian
neural networks, suggesting these
models are interpretable and thus bet-
ter able to provide clues about the
causal relationships between physio-
logic signals and affective states. The
task of inferring causal relationships
from observational data has been ex-
tensively studied. 22 Causal inference
methods, however, tend to rely on
strong assumptions and are not widely
used by practitioners, especially on
large, complex datasets.
Transferability. Typically, training
and test data are chosen by randomly
partitioning examples from the same
distribution. A model’s generalization
error is then judged by the gap between
its performance on training and test
data. Humans exhibit a far richer capac-
ity to generalize, however, transferring
learned skills to unfamiliar situations.
ML algorithms are already used in
these situations, such as when the en-
vironment is nonstationary. Models
are also deployed in settings where
their use might alter the environment,
invalidating their future predictions.
Along these lines, Caruana et al. 3 de-
scribe a model trained to predict prob-
ability of death from pneumonia that
assigned less risk to patients if they also
had asthma. Presumably, asthma was
predictive of a lower risk of death be-
cause of the more aggressive treatment
these patients received. If the model
were deployed to aid in triage, these pa-
tients might then receive less aggres-
sive treatment, invalidating the model.
Even worse, there are situations,
such as machine learning for security,
where the environment might be ac-
tively adversarial. Consider the recent-
ly discovered susceptibility of convo-
lutional neural networks (CNNs). The
CNNs were made to misclassify images
that were imperceptibly (to a human)
perturbed. 26 Of course, this is not over-
fitting in the classical sense. The mod-
els both achieve strong results on train-
ing data and generalize well when used
to classify held out test data. The cru-
cial distinction is that these images
have been altered in ways that, while
subtle to human observers, the models
never encountered during training.
However, these are mistakes a human
would not make, and it would be pref-
erable that models not make these
mistakes, either. Already, supervised
learning models are regularly subject
to such adversarial manipulation. Con-
sider the models used to generate cred-
it ratings; higher scores should signify
a higher probability that an individual
repays a loan. According to its own
technical report, FICO trains credit
models using logistic regression, 6 spe-
cifically citing interpretability as a mo-
tivation for the choice of model. Fea-
tures include dummy variables
representing binned values for average
age of accounts, debt ratio, the number
of late payments, and the number of
accounts in good standing.
Several of these factors can be ma-
nipulated at will by credit-seekers. For
example, one’s debt ratio can be im-
proved simply by requesting periodic
increases to credit lines while keeping
spending patterns constant.
Similarly, simply applying for new
accounts when the probability of ac-
ceptance is reasonably high can in-
crease the total number of accounts.
Indeed, FICO and Experian both ac-
knowledge that credit ratings can be
manipulated, even suggesting guides
for improving one’s credit rating.
These rating-improvement strategies
may fundamentally change one’s un-
derlying ability to pay a debt. The fact
that individuals actively and success-
fully game the rating system may inval-
idate its predictive power.
Informativeness. Sometimes, deci-
sion theory is applied to the outputs of
supervised models to take actions in
the real world. In another common use
paradigm, however, the supervised
model is used instead to provide infor-
mation to human decision-makers, a
setting considered by Kim et al. 11 and
Huysmans et al. 8 While the machine-
learning objective might be to reduce
error, the real-world purpose is to pro-
vide useful information. The most ob-
vious way that a model conveys infor-
mation is via its outputs. However, we
might hope that by probing the pat-
terns that the model has extracted, we
can convey additional information to a
human decision maker.
An interpretation may prove informative even without shedding light on