classifier with a natural language, image-captioning system.
13 The classifier
uses training data labeled with the objects appearing in the image; the captioning system is labeled with English
sentences describing the appearance
of the image. By training these systems
jointly, the variables in the hidden layers may get aligned to semantically
meaningful concepts, even as they are
being trained to provide discriminative
power. This results in English language
descriptions of images that have both
high image relevance (from the captioning training data) and high class
relevance (from the object recognition
training data), as shown in Figure 6.
While this method works well for
many examples, some explanations include details that are not actually present in the image; newer approaches,
such as phrase-critic methods, may create even better descriptions.
approach might determine if there are
hidden layers in the learned classifier
that learn concepts corresponding to
something meaningful. For example,
Zeiler and Fergus observed that certain layers may function as edge or pattern detectors.
40 Whenever a user can
identify the presence of such layers,
then it may be preferable to use them
in the explanation. Bau et al. describe
an automatic mechanism for matching CNN representations with semantically meaningful concepts using a
large, labeled corpus of objects, parts,
and texture; furthermore, using this
alignment, their method quantitatively
scores CNN interpretability, potentially suggesting a way to optimize for
However, many obstacles remain.
As one example, it is not clear there are
satisfying ways to describe important,
discriminative features, which are often intangible, for example, textures.
An intelligible explanation may need to
define new terms or combine language
with other modalities, like patches of
an image. Another challenge is inducing first-order, relational descriptions,
which would enable descriptions such
as “a spider because it has eight legs”
and “full because all seats are occupied.”
While quantified and relational abstractions are very natural for people, progress in statistical-relational learning
has been slow and there are many open
questions for neuro-symbolic learning.
Facilitating user control with explanatory models. Generating an explanation by mapping an inscrutable
model into a simpler, explanatory
model is only half of the battle. In addition to answering counterfactuals
about the original model, we would
ideally be able to map any control actions the user takes in the explanatory
model back as adjustments to the original, inscrutable model. For example,
as we illustrated how a user could directly edit a GA2M’s shape curve (
Figure 4b) to change the model’s response
to asthma. Is there a way to interpret
such an action, made to an intelligible
explanatory model, as a modification
to the original, inscrutable model? It
seems unlikely that we will discover a
general method to do this for arbitrary
source models, since the abstraction
mapping is not invertible in general.
However, there are likely methods for
mapping backward to specific classes
of source models or for specific types
of feature-transform mappings. This is
an important area for future study.
Toward Interactive Explanation
The optimal choice of explanation de-
pends on the audience. Just as a hu-
man teacher would explain physics
differently to students who know or
do not yet know calculus, the technical
sophistication and background knowl-
edge of the recipient affects the suit-
ability of a machine-generated expla-
nation. Furthermore, the concerns of
a house seeker whose mortgage appli-
cation was denied due to a FICO score
differ from those of a developer or data
scientist debugging the system. There-
fore, an ideal explainer should model
the user’s background over the course
of many interactions.
The HCI community has long studied mental models,
31 and many intelligent tutoring systems (ITSs) build
explicit models of students’ knowledge and misconceptions.
the frameworks for these models are
typically hand-engineered for each
subject domain, so it may be difficult to adapt ITS approaches to a system that aims to explain an arbitrary
Even with an accurate user model,
it is likely that an explanation will not
answer all of a user’s concerns, because
the human may have follow-up questions. We conclude that an explanation
system should be interactive, supporting such questions from and actions by
the user. This matches results from psychology literature, summarized earlier,
and highlights Grice’s maxims, especially those pertaining to quantity and
relation. It also builds on Lim and Dey’s
work in ubiquitous computing, which
investigated the kinds of questions users wished to ask about complex, context-aware applications.
24 We envision
an interactive explanation system that
supports many different follow-up and
“Visual explanations are both image relevant and class relevant. In contrast, image descriptions are
image relevant, but not necessarily class relevant, and class definitions are class relevant but not
necessarily image relevant.”
Description: This is a large bird with a white neck and a black back in the water.
Class Definition: The Laysan Albatross is a seabird with a hooked yellow beak,
black back, and white belly.
Visual Explanation: This is a Laysan Albatross because this bird has
a hooked yellow beak, white neck, and black back.