AI may be using inadequate features.
Features are often correlated, and
when one feature is included in a model,
machine learning algorithms extract
as much signal as possible from it, indirectly modeling other features that
were not included. This can lead to
problematic models, as illustrated by
Figure 4b (and described later), where
the ML determined that a patient’s
prior history of asthma (a lung disease) was negatively correlated with
death by pneumonia, presumably due
to correlation with (unmodeled) variables, such as these patients receiving
timely and aggressive therapy for lung
problems. An intelligible model helps
humans to spot these issues and correct them, for example, by adding additional features.
4
Distributional drift. A deployed
model may perform poorly in the wild,
that is, when a difference exists between the distribution which was used
during training and that encountered
during deployment. Furthermore, the
deployment distribution may change
over time, perhaps due to feedback
from the act of deployment. This is
common in adversarial domains, such
as spam detection, online ad pricing,
and search engine optimization. Intelligibility helps users determine when
models are failing to generalize.
Facilitating user control. Many AI
systems induce user preferences from
their actions. For example, adaptive
news feeds predict which stories are
likely most interesting to a user. As
robots become more common and enter the home, preference learning will
become ever more common. If users
understand why the AI performed an
undesired action, they can better issue
instructions that will lead to improved
future behavior.
User acceptance. Even if they do not
seek to change system behavior, users
have been shown to be happier with
and more likely to accept algorithmic
decisions if they are accompanied by
an explanation.
18 After being told they
should have their kidney removed, it’s
natural for a patient to ask the doctor
why—even if they don’t fully understand the answer.
Improving human insight. While
improved AI allows automation of
tasks previously performed by hu-
mans, this is not their only use. In ad-
important. We start by discussing tech-
nical reasons, but social factors are im-
portant as well.
AI may have the wrong objective.
In some situations, even 100% perfect
performance may be insufficient, for
example, if the performance metric is
flawed or incomplete due to the difficulty of specifying it explicitly. Pundits
have warned that an automated factory
charged with maximizing paperclip
production, could subgoal on killing
humans, who are using resources that
could otherwise be used in its task.
While this example may be fanciful, it
illustrates that it is remarkably diffi-
cult to balance multiple attributes of a
utility function. For example, as Lipton
observed,
25 “An algorithm for making
hiring decisions should simultane-
ously optimize for productivity, ethics
and legality.” However, how does one
express this trade-off? Other examples
include balancing training error while
uncovering causality in medicine and
balancing accuracy and fairness in re-
cidivism prediction.
12 For the latter, a
simplified objective function such as
accuracy combined with historically
biased training data may cause uneven
performance for different groups (for
example, people of color). Intelligibil-
ity empowers users to determine if an
AI is right for the right reasons.
Figure 1. Adding an imperceptibly small vector to an image changes the GoogLeNet39 image
recognizer’s classification of the image from “panda” to “gibbon.” Source: Goodfellow et al.
9
“panda”
57.7% confidence
“nematode”
8.2% confidence
“gibbon”
99.3% confidence
Figure 2. Approaches for crafting intelligible AI.
Intelligible?
Map to Simpler Model
• Explanations
• Controls
Interact with
Simpler Model
No
Yes
Use
Directly
Figure 3. The dashed blue shape indicates the space of possible mistakes humans can make.
The red shape denotes the AI’s mistakes;
its smaller size indicates a net reduction
in the number of errors. The gray region
denotes AI-specific mistakes a human
would never make. Despite reducing the
total number of errors, a deployed model
may create new areas of liability (gray),
necessitating explanations.
Human Errors
AI Errors AI-Specific
Errors