plain interactions between combina-torics and learned models.
In order to trust deployed AI systems,
we must not only improve their robustness,
5 but also develop ways to make
their reasoning intelligible. Intelligibility will help us spot AI that makes
mistakes due to distributional drift or
incomplete representations of goals
and features. Intelligibility will also
facilitate control by humans in increasingly common collaborative human/AI
teams. Furthermore, intelligibility will
help humans learn from AI. Finally,
there are legal reasons to want intelligible AI, including the European GDPR
and a growing need to assign liability
when AI errs.
Depending on the complexity of
the models involved, two approaches
to enhancing understanding may be
appropriate: using an inherently interpretable model, or adopting an inscrutably complex model and generating post hoc explanations by mapping
it to a simpler, explanatory model
through a combination of currying
and local approximation. When learning a model over a medium number
of human-interpretable features, one
may confidently balance performance
and intelligibility with approaches
like GA2Ms. However, for problems
with thousands or millions of features, performance requirements
likely force the adoption of inscrutable methods, such as deep neural
networks or boosted decision trees.
In these situations, posthoc explanations may be the only way to facilitate
Research on explanation algo-
rithms is developing rapidly, with
work on both local (instance-specific)
explanations and global approxima-
tions to the learned model. A key chal-
lenge for all these approaches is the
construction of an explanation vocab-
ulary, essentially a set of features used
in the approximate explanation mod-
el. Different explanatory models may
be appropriate for different choices of
explanatory foil, an aspect deserving
more attention from systems build-
ers. While many intelligible models
can be directly edited by a user, more
research is needed to determine how
best to map such actions back to mod-
ify an underlying inscrutable model.
Results from psychology show that
explanation is a social process, best
thought of as a conversation. As a re-
sult, we advocate increased work on
interactive explanation systems that
support a wide range of follow-up ac-
tions. To spur rapid progress in this
important field, we hope to see col-
laboration between researchers in
Acknowledgments. We thank E.
Adar, S. Ameshi, R. Calo, R. Caruana,
M. Chickering, O. Etzioni, J. Heer, E.
Horvitz, T. Hwang, R. Kambhamapti,
E. Kamar, S. Kaplan, B. Kim, P. Simard,
Mausam, C. Meek, M. Michelson, S.
Minton, B. Nushi, G. Ramos, M. Ribeiro, M. Richardson, P. Simard, J. Suh,
J. Teevan, T. Wu, and the anonymous
reviewers for helpful conversations and
comments. This work was supported in
part by the Future of Life Institute grant
2015-144577 (5388) with additional
support from NSF grant IIS-1420667,
ONR grant N00014-15-1-2774, and the
1. Amershi, S., Cakmak, M., Knox, W. and Kulesza, T.
Power to the people: The role of humans in interactive
machine learning. AI Magazine 35, 4 (2014), 105–120.
2. Anderson, J.R., Boyle, F. and Reiser, B. Intelligent
tutoring systems. Science 228, 4698 (1985), 456–462.
3. Besold, T. et al. Neural-Symbolic Learning and
Reasoning: A Survey and Interpretation. CoRR
abs/1711.03902 (2017). arXiv:1711.03902
4. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M.
and Elhadad, N. Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day
readmission. In KDD, 2015.
5. Dietterich, T. Steps towards robust artificial
intelligence. AI Magazine 38, 3 (2017).
6. Doshi-Velez, F. and Kim, B. Towards a rigorous science
of interpretable machine learning. ArXiv (2017),
7. Ferguson, G. and Allen, J.F. TRIPS: An integrated
intelligent problem-solving assistant. In AAAI/
8. Fox, M., Long, D. and Magazzeni, D. Explainable
Planning. In IJCAI XAI Workshop, 2017; http://arxiv.
9. Goodfellow, I. J., Shlens, J. and Szegedy, C. 2014.
Explaining and Harnessing Adversarial Examples.
ArXiv (2014), arXiv:1412.6572
10. Grice, P. Logic and Conversation, 1975, 41–58.
11. Halpern, J. and Pearl, J. Causes and explanations: A
structural-model approach. Part I: Causes. The British
J. Philosophy of Science 56, 4 (2005), 843–887.
12. Hardt, M., Price, E. and Srebro, N. Equality of
opportunity in supervised learning. In NIPS, 2016.
13. Hendricks, L., Akata, Z., Rohrbach, M., Donahue,
J., Schiele, B. and Darrell, T. Generating visual
explanations. In ECCV, 2016.
14. Hendricks, L.A., Hu, R., Darrell, T. and Akata,
Z. Grounding visual explanations. ArXiv (2017),
15. Hilton, D. Conversational processes and causal
explanation. Psychological Bulletin 107, 1 (1990), 65.
16. Kahneman, D. Thinking, Fast and Slow. Farrar, Straus
and Giroux, New York, 2011; http://a.co/hG YmXGJ
17. Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler,
J., Viegas, F. and Sayres, R. 2017. Interpretability
beyond feature attribution: Quantitative testing with
concept activation vectors. ArXiv e-prints (Nov. 2017);
18. Koehler, D.J. Explanation, imagination, and confidence
in judgment. Psychological Bulletin 110, 3 (1991), 499.
19. Koh, P. and Liang, P. Understanding black-box
predictions via influence functions. In ICML, 2017.
20. Krause, J., Dasgupta, A., Swartz, J., Aphinyanaphongs,
Y. and Bertini, E. A workflow for visual diagnostics of
binary classifiers using instance-level explanations. In
IEEE VAST, 2017.
21. Kulesza, T., Burnett, M., Wong, W. and Stumpf, S.
Principles of explanatory debugging to personalize
interactive machine learning. In IUI, 2015.
22. Lakkaraju, H., Kamar, E., Caruana, R. and Leskovec, J.
Interpretable & explorable approximations of black
box models. KDD-FATML, 2017.
23. Lewis, D. Causal explanation. Philosophical Papers 2
24. Lim, B. Y. and Dey, A.K. Assessing demand for
intelligibility in context-aware applications. In
Proceedings of the 11th International Conference on
Ubiquitous Computing (2009). ACM, 195–204.
25. Lipton, Z. The Mythos of Model Interpretability.
In Proceedings of ICML Workshop on Human
Interpretability in ML, 2016.
26. Lombrozo, T. Simplicity and probability in causal
explanation. Cognitive Psychology 55, 3 (2007),
27. Lou, Y., Caruana, R. and Gehrke, J. Intelligible models
for classification and regression. In KDD, 2012.
28. Lundberg, S. and Lee, S. A unified approach to
interpreting model predictions. NIPS, 2017.
29. McCarthy, J. and Hayes, P. Some philosophical
problems from the standpoint of artificial intelligence.
Machine Intelligence (1969), 463–502.
30. Miller, T. Explanation in artificial intelligence: Insights
from the social sciences. Artificial Intelligence 267
(Feb. 2018), 1–38.
31. Norman, D.A. Some observations on mental models.
Mental Models, Psychology Press, 2014, 15–22.
32. Papadimitriou, A., Symeonidis, P. and Manolopoulos,
Y. A generalized taxonomy of explanations styles
for traditional and social recommender systems.
Data Mining and Knowledge Discovery 24, 3 (2012),
33. Ribeiro, M., Singh, S. and Guestrin, C. Why should I
trust you?: Explaining the predictions of any classifier.
In KDD, 2016.
34. Ribeiro, M., Singh, S. and Guestrin, C. Anchors: High-precision model- agnostic explanations. In AAAI,
35. Silver, D. et al. Mastering the game of Go with deep
neural networks and tree search. Nature 529, 7587
36. Sloman, S. Explanatory coherence and the induction of
properties. Thinking & Reasoning 3, 2 (1997), 81–110.
37. Sreedharan, S., Srivastava, S. and Kambhampati, S.
Hierarchical expertise- level modeling for user specific
robot-behavior explanations. ArXiv e-prints, (Feb.
38. Swartout, W. XPLAIN: A system for creating and
explaining expert consulting programs. Artificial
Intelligence 21, 3 (1983), 285–325.
39. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. and
Wojna, Z. Rethinking the inception architecture for
computer vision. In CVPR, 2016.
40. Zeiler, M. and Fergus, R. Visualizing and understanding
convolutional networks. In ECCV, 2014.
Daniel S. Weld ( email@example.com) is Thomas
J. Cable/WRF Professor in the Paul G. Allen School of
Computer Science & Engineering at the University of
Washington, Seattle, WA, USA.
Gagan Bansal ( firstname.lastname@example.org) is a
graduate student in the Paul G. Allen School of Computer
Science & Engineering at the University of Washington,
Seattle, WA, USA.
Copyright held by authors/owners.
Publishing rights licensed to ACM.
Watch the authors discuss
this work in the exclusive