of transparency, this can be shown
directly. For post hoc interpretability, work in this field should fix a clear
objective and demonstrate evidence
that the offered form of interpretation
achieves it.
˲ In some cases, transparency may be
at odds with the broader objectives of
AI (artificial intelligence). Some arguments against black-box algorithms
appear to preclude any model that
could match or surpass human abilities on complex tasks. As a concrete
example, the short-term goal of building trust with doctors by developing
transparent models might clash with
the longer-term goal of improving
health care. Be careful when giving
up predictive power that the desire
for transparency is justified and not
simply a concession to institutional
biases against new methods.
˲ Post hoc interpretations can potentially mislead. Beware of blindly embracing post hoc notions of interpretability, especially when optimized to
placate subjective demands. In such
cases, one might—deliberately or
not—optimize an algorithm to present misleading but plausible explanations. As humans, we are known to
engage in this behavior, as evidenced
in hiring practices and college admissions. Several journalists and social
scientists have demonstrated that
acceptance decisions attributed to
virtues such as leadership or originality often disguise racial or gender
discrimination. 21 In the rush to gain
acceptance for machine learning and
to emulate human intelligence, we
should all be careful not to reproduce
pathological behavior at scale.
Future Work
There are several promising directions
for future work. First, for some prob-
lems, the discrepancy between real-life
and machine-learning objectives could
be mitigated by developing richer loss
functions and performance metrics.
Exemplars of this direction include re-
search on sparsity-inducing regulariz-
ers and cost-sensitive learning. Second,
this analysis can be expanded to other
ML paradigms such as reinforcement
learning. Reinforcement learners can
address some (but not all) of the ob-
jectives of interpretability research by
directly modeling interaction between
models and environments. This capa-
bility, however, may come at the cost of
allowing models to experiment in the
world, incurring real consequences.
Notably, reinforcement learners
are able to learn causal relationships
between their actions and real-world
impacts. Like supervised learning,
however, reinforcement learning relies on a well-defined scalar objective.
For problems such as fairness, where
we struggle to verbalize precise definitions of success, a shift of the ML
paradigm is unlikely to eliminate the
problems we face.
Related articles
on queue.acm.org
Accountability in
Algorithmic Decision Making
Nicholas Diakopoulos
https://queue.acm.org/detail.cfm?id=2886105
Black Box Debugging
James A. Whittaker and Herbert H. Thompson
https://queue.acm.org/detail.cfm?id=966807
Hazy: Making It Easier to Build
and Maintain Big-Data Analytics
Arun Kumar, Feng Niu, and Christopher Ré
https://queue.acm.org/detail.cfm?id=2431055
References
1. Athey, S. and Imbens, G. W. Machine-learning
methodsm 2015; https://arxiv.org/abs/1504.01132v1.
2. Caruana, R., Kangarloo, H., Dionisio, J. D, Sinha, U. and
Johnson, D. Case-based explanation of non-case-based learning methods. In Proceedings of the Amer.
Med. Info. Assoc. Symp., 1999, 12–215.
3. Caruana, R., Lou, Y., Gehrke, J., Koch, P., Sturm, M.
and Elhadad, N. Intelligible models for healthcare:
Predicting pneumonia risk and hospital 30-day
readmission. In Proceedings of the 21st SIGKDD
Intern. Conf. Knowledge Discovery and Data Mining,
2017, 1721–1730.
4. Chang, J., Gerrish, S., Wang, C., Boyd-Graber, J. L., Blei,
D. M. 2009. Reading tea leaves: how humans interpret
topic models. In Proceedings of the 22nd Intern.
Conf. Neural Information Processing Systems, 2009,
288–296.
5. Doshi-Velez, F., Wallace, B. and Adams, R. Graph-sparse lDA: A topic model with structured sparsity.
In Proceedings of the 29th Assoc. Advance. Artificial
Intelligence Conf., 2015, 2575–2581.
6. Fair Isaac Corporation (FICO). Introduction to model
builder scorecard, 2011; http://www.fico.com/en/
latest- thinking/white-papers/introduction-to-model-builder- scorecard.
7. Goodman, B. and Flaxman, S. European Union
regulations on algorithmic decision-making and
a ‘right to explanation,’ 2016; https://arxiv.org/
abs/1606.08813v3.
8. Huysmans, J., Dejaeger, K., Mues, C., Vanthienen,
J. and Baesens, B. An empirical evaluation of the
comprehensibility of decision table, tree- and rule-based predictive models. J. Decision Support Systems
51, 1 (2011), 141–154.
9. Kim, B. Interactive and interpretable machine-learning models for human-machine collaboration.
Ph.D. thesis. Massachusetts Institute of Technology,
Cambridge, MA, 2015.
10. Kim, B., Rudin, C. and Shah, J. A. The Bayesian
case model: A generative approach for case-based
reasoning and prototype classification. In Proceedings
of the 27th Intern. Conf. Neural Information Processing
Systems, Vol. 2, 1952–1960, 2014.
11. Kim, B., Glassman, E., Johnson, B. and Shah, J. iBCM:
Interactive Bayesian case model empowering humans
via intuitive interaction. Massachusetts Institute of
Technology, Cambridge, MA, 2015.
12. Krening, S., Harrison, B., Feigh, K., Isbell, C., Riedl,
M. and Thomaz, A. Learning from explanations using
sentiment and advice in RL. IEEE Trans. Cognitive and
Developmental Systems 9, 1 (2017), 41–55.
13. Lipton, Z.C., Kale, D. C. and Wetzel, R. Modeling missing
data in clinical time series with RNNs. In Proceedings
of Machine Learning for Healthcare, 2016.
14. Liu, C., Rani, P. and Sarkar, N. 2006. An empirical study
of machine-learning techniques for affect recognition
in human-robot interaction. Pattern Analysis and
Applications 9, 1 (2006), 58–69.
15. Lou, Y., Caruana, R. and Gehrke, J. Intelligible models
for classification and regression. In Proceedings of the
18th ACM SIGKDD Intern. Conf. Knowledge Discovery
and Data Mining, 2012, 150–158.
16. Lou, Y., Caruana, R., Gehrke, J. and Hooker, G. Accurate
intelligible models with pairwise interactions. In
Proceedings of the 19th ACM SIGKDD Intern. Conf.
Knowledge Discovery and Data Mining, 2013, 623–631.
17. Mahendran, A. and Vedaldi, A. Understanding
deep image representations by inverting them. In
Proceedings of the IEEE Conf. Computer Vision and
Pattern Recognition, 2015, 1–9.
18. McAuley, J. and Leskovec, J. Hidden factors and
hidden topics: Understanding rating dimensions with
review text. In Proceedings of the 7th ACM Conf.
Recommender Systems, 2013, 165–172.
19. Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. and
Dean, J. Distributed representations of words and
phrases and their compositionality. In Proceedings of
the 26th Intern. Conf. Neural Information Processing
Systems 2, 2013, 3111–3119.
20. Mordvintsev, A., Olah, C. and Tyka, M. Inceptionism:
Going deeper into neural networks. Google AI Blog;
https://ai.googleblog.com/2015/06/inceptionism-
going- deeper-into-neural.html.
21. Mounk, Y. Is Harvard unfair to Asian-Americans?
New York Times (Nov. 24, 2014); http://www.nytimes.
com/2014/11/25/opinion/is-harvard-unfair-to-asian-
americans.html.
22. Pearl, J. Causality. Cambridge University Press,
Cambridge, MA, 2009.
23. Ribeiro, M. T., Singh, S. and Guestrin, C. ‘Why should
I trust you?’ Explaining the predictions of any
classifier. In Proceedings of the 22nd SIGKDD Intern.
Conf. Knowledge Discovery and Data Mining, 2016,
1135–1144.
24. Ridgeway, G., Madigan, D., Richardson, T. and O’Kane,
J. Interpretable boosted naïve Bayes classification.
In Proceedings of the 4th Intern. Conf. Knowledge
Discovery and Data Mining, 1998, 101–104.
25. Simonyan, K., Vedaldi, A., Zisserman, A. Deep
inside convolutional networks: Visualising image
classification models and saliency maps, 2013; https://
arxiv. org/abs/1312.6034.
26. Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J.,
Erhan, D., Goodfellow, I. and Fergus, R. Intriguing
properties of neural networks, 2013; https://arxiv.org/
abs/1312.6199.
27. Tibshirani, R. 1996. Regression shrinkage and selection
via the lasso. J. Royal Statistical Society: Series B:
Statistical Methodology 58, 1 (1996), 267–288.
28. Van der Maaten, L. and Hinton, G. Visualizing data
using t-SNE. J. Machine Learning Research 9 (2008),
2579–2605.
29. Wang, H.-X., Fratiglioni, L., Frisoni, G. B., Viitanen,
M. and Winblad, B. Smoking and the occurrence
of Alzheimer’s disease: Cross-sectional and
longitudinal data in a population-based study. Amer. J.
Epidemiology 149, 7 (1999), 640–644.
30. Wang, Z., Freitas, N. and Lanctot, M. Dueling network
architectures for deep reinforcement learning. In
Proceedings of the 33rd Intern. Conf. Machine Learning
48, 2016, 1995–2003.
Zachary C. Lipton (Twitter @zacharylipton or GitHub @
zackchase) is an assistant professor at Carnegie Mellon
University in Pittsburgh, PA, USA. His work addresses
diverse application areas, including medical diagnosis,
dialogue systems, and product recommendation.
He is the founding editor of the Approximately Correct
blog and the lead author of Deep Learning—The Straight
Dope, an open source interactive book teaching deep
learning through Jupyter notebooks.
Copyright held by owner/author.
Publication rights licensed to ACM. $15.00.