model28 is likely a poor global representation of f, it is hopefully an accurate local approximation of the
boundary in the vicinity of the instance being explained.
Ribeiro et al. tested LIME on several
domains. For example, they explained
the predictions of a convolutional neural network image classifier by converting the pixel-level features into a
smaller set of “super-pixels;” to do so,
they ran an off-the-shelf segmentation
algorithm that identified regions in
the input image and varied the color of
some these regions when generating
“similar” images. While LIME provides
no formal guarantees about its explanations, studies showed that LIME’s explanations helped users evaluate which
of several classifiers best generalizes.
Choice of explanatory vocabulary.
Ribeiro et al.’s use of presegmented
image regions to explain image classification decisions illustrates the larger
problem of determining an explanatory vocabulary. Clearly, it would not
make sense to try to identify the exact
pixel that led to the decision: pixels are
too low level a representation and are
not semantically meaningful to users.
In fact, deep neural network’s power
comes from the very fact that their hidden layers are trained to recognize latent features in a manner that seems
to perform much better than previous
efforts to define such features independently. Deep networks are inscrutable
exactly because we do not know what
those hidden features denote.
To explain the behavior of such
models, however, we must find some
high-level abstraction over the input
pixels that communicate the model’s
essence. Ribeiro et al.’s decision to use
an off-the-shelf image-segmentation
system was pragmatic. The regions it
selected are easily visualized and carry
some semantic value. However, regions are chosen without any regard to
how the classifier makes a decision. To
explain a blackbox model, where there
is no possible access to the classifier’s
internal representation, there is likely
no better option; any explanation will
However, if a user can access the
classifier and tailor the explanation
system to it, there are ways to choose
a more meaningful vocabulary. One
interesting method jointly trains a
are akin to a doctor explaining specific
reasons for a patient’s diagnosis rather
than communicating all of her medi-
cal knowledge. Contrast this approach
with the global understanding of the
model that one gets with a GA2M model.
Mathematically, one can see a local
explanation as currying—several vari-
ables in the model are fixed to specific
values, allowing simplification.
Generating a local explanation is
a common practice in AI systems.
For example, early rule-based expert
systems included explanation systems that augmented a trace of the
system’s reasoning—for a particular
case—with background knowledge.
Recommender systems, one of the
first deployed uses of machine learning, also induced demand for explanations of their specific recommendations; the most satisfying answers
combined justifications based on
the user’s previous choices, ratings
of similar users, and features of the
items being recommended.
Locally approximate explanations.
In many cases, however, even a local
explanation can be too complex to
understand without approximation.
Here, the key challenge is deciding
which details to omit when creating the simpler explanatory model.
Human preferences, discovered by
psychologists and summarized previ-
ously, should guide algorithms that
construct these simplifications.
Ribeiro et al.’s LIME system33 is a
good example of a system for generat-
ing a locally approximate explanatory
model of an arbitrary learned model,
but it sidesteps part of the question of
which details to omit. Instead, LIME
requires the developer to provide two
additional inputs: A set of semantical-
ly meaningful features X′ that can be
computed from the original features,
and an interpretable learning algo-
rithm, such as a linear classifier (or a
GA2M), which it uses to generate an ex-
planation in terms of the X′.
The insight behind LIME is shown
in Figure 5. Given an instance to ex-
plain, shown as the bolded red cross,
LIME randomly generates a set of
similar instances and uses the black-
box classifier, f, to predict their val-
ues (shown as the red crosses and
blue circles). These predictions are
weighted by their similarity to the in-
put instance (akin to locally weighted
regression) and used to train a new,
simpler intelligible classifier, shown
on the figure as the linear decision
boundary, using X′, the smaller set
of semantic features. The user re-
ceives the intelligible classifier as an
explanation. While this explanation
Figure 5. The intuition guiding LIME’s method for constructing an approximate local
explanation. Source: Ribeiro et al.
“The black-box model’s complex decision function, f, (unknown to LIME) is represented by the blue/
pink background, which cannot be approximated well by a linear model. The bold red cross is the
instance being explained. LIME samples instances, gets predictions using f, and weighs them by the
proximity to the instance being explained (represented here by size). The dashed line is the learned
explanation that is locally (but not globally) faithful.”