practice
SUPERVISED MACHINE-LEARNING models boast
remarkable predictive capabilities. But can you trust
your model? Will it work in deployment? What else
can it tell you about the world? Models should be not
only good, but also interpretable, yet the task of
interpretation appears underspecified. The
academic literature has provided diverse and
sometimes non-overlapping motivations for
interpretability and has offered myriad techniques
for rendering interpretable models. Despite this
ambiguity, many authors proclaim their models to be
interpretable axiomatically, absent further argument.
Problematically, it is not clear what common properties
unite these techniques.
This article seeks to refine the discourse on
interpretability. First it examines the objectives of
previous papers addressing interpretability, finding
them to be diverse and occasionally discordant.
Then, it explores model properties and techniques
thought to confer interpretability, identifying
transparency to humans and post hoc
explanations as competing concepts.
Throughout, the feasibility and desir-
ability of different notions of inter-
pretability are discussed. The article
questions the oft-made assertions that
linear models are interpretable and
that deep neural networks are not.
Until recently, humans had a monopoly on agency in society. If you applied for a job, loan, or bail, a human
decided your fate. If you went to the
hospital, a human would attempt to
categorize your malady and recommend treatment. For consequential
decisions such as these, you might demand an explanation from the decision-making agent.
If your loan application is denied,
for example, you might want to understand the agent’s reasoning in a bid to
strengthen your next application. If
the decision was based on a flawed
premise, you might contest this premise in the hope of overturning the decision. In the hospital, a doctor’s explanation might educate you about
your condition.
In societal contexts, the reasons for a
decision often matter. For example, intentionally causing death (murder) vs.
unintentionally (manslaughter) are
distinct crimes. Similarly, a hiring decision being based (directly or indirectly) on a protected characteristic such as
race has a bearing on its legality. However, today’s predictive models are not
capable of reasoning at all.
Over the past 20 years, rapid progress in machine learning (ML) has led
to the deployment of automatic decision processes. Most ML-based decision making in practical use works in
the following way: the ML algorithm
is trained to take some input and predict the corresponding output. For example, given a set of attributes characterizing a financial transaction, an
ML algorithm can predict the long-term return on investment. Given images from a CT scan, the algorithm
can assign a probability that the scan
depicts a cancerous tumor. The ML algorithm takes in a large corpus of (in-
The Mythos
of Model
Interpretability
DOI: 10.1145/3233231
Article development led by
queue.acm.org
In machine learning, the concept of
interpretability is both important and slippery.
BY ZACHARY C. LIPTON