I
M
A
G
E
B
Y
A
L
I
C
I
A
K
U
B
I
S
T
A
/
A
N
D
R
I
J
B
O
R
Y
S
A
S
S
O
C
I
A
T
E
S
put, output) pairs, and outputs a model
that can predict the output corre-
sponding to a previously unseen in-
put. Formally, researchers call this
problem setting supervised learning.
Then, to automate decisions fully,
one feeds the model’s output into
some decision rule. For example,
spam filters programmatically dis-
card email messages predicted to be
spam with a level of confidence ex-
ceeding some threshold.
Thus, ML-based systems do not
know why a given input should receive
some label, only that certain inputs are
correlated with that label. For exam-
ple, shown a dataset in which the only
orange objects are basketballs, an im-
age classifier might learn to classify all
orange objects as basketballs. This
model would achieve high accuracy
even on held out images, despite fail-
ing to grasp the difference that actually
makes a difference.
As ML penetrates critical areas such
as medicine, the criminal justice sys-
tem, and financial markets, the inabil-
ity of humans to understand these
models seems problematic. Some sug-
gest model interpretability as a remedy,
but in the academic literature, few au-
thors articulate precisely what inter-
pretability means or precisely how
their proposed solution is useful.
Despite the lack of a definition, a
growing body of literature proposes
purportedly interpretable algorithms.
From this, you might conclude that ei-
ther: the definition of interpretability is
universally agreed upon, but no one has
bothered to set it in writing; or the term
interpretability is ill-defined, and, thus,
claims regarding interpretability of var-
ious models exhibit a quasi-scientific
character. An investigation of the litera-
ture suggests the latter. Both the objec-
tives and methods put forth in the liter-
ature investigating interpretability are
diverse, suggesting that interpretability
is not a monolithic concept but several
distinct ideas that must be disentan-
gled before any progress can be made.
This article focuses on supervised
learning rather than other ML para-
digms such as reinforcement learning
and interactive learning. This scope de-
rives from the current primacy of su-
pervised learning in real-world applica-
tions and an interest in the common
claim that linear models are interpre-
table while deep neural networks are
not. 15 To gain conceptual clarity, con-
sider these refining questions: What is
interpretability? Why is it important?
Let’s address the second question
first. Many authors have proposed interpretability as a means to engender
trust. 9, 24 This leads to a similarly vexing
epistemological question: What is
trust? Does it refer to faith that a model
will perform well? Does trust require a
low-level mechanistic understanding