Accountability in Algorithmic
1. Angwin, J., Larson, J. Machine bias. ProPublica
(May 23, 2016).
2. Case, N. How to become a centaur. J. Design and
Science (Jan. 2018).
3. Chouldechova, A. Fair prediction with disparate
impact: A study of bias in recidivism prediction
instruments. Big Data 5, 2 (2017), 153–163.
4. Corbett-Davies, S., Pierson, E., Feller, A., Goel, S. and
Huq, A. Algorithmic decision making and the cost of
fairness. In Proceedings of the 23rd ACM SIGKDD
Intern. Conf. Knowledge Discovery and Data Mining.
ACM Press, 2017, 797–806.
5. Critcher, C.R. and Gilovich, T. Incidental environmental
anchors. J. Behavioral Decision Making 21, 3 (2008),
6. Dressel, J. and Farid, H. The accuracy, fairness, and
limits of predicting recidivism. Science Advances 4, 1
7. Englich, B., Mussweiler, T. and Strack, F. Playing dice
with criminal sentences: the influence of irrelevant
anchors on experts’ judicial decision making.
Personality and Social Psychology Bulletin 32, 2
8. Furnham, A. and Boo, H.C. A literature review of the
anchoring effect. The J. Socio-Economics 40, 1 (2011),
9. Goldstein, I.M., Lawrence, J. and Miner, A.S. Human-machine collaboration in cancer and beyond: The
Centaur Care Model. JAMA Oncology 3, 10 (2017), 1303.
10. Green, K.C. and Armstrong, J.S. Evidence on the
effects of mandatory disclaimers in advertising. J.
Public Policy & Marketing 31, 2 (2012), 293–304.
11. Horvitz, E. and Paek, T. Complementary computing:
policies for transferring callers from dialog systems to
human receptionists. User Modeling and User-Adapted
Interaction 17, 1-2 (2007), 159–182.
12. Johnson, R.C. Overcoming AI bias with AI fairness.
Commun. ACM (Dec. 6, 2018).
13. Jukier, R. Inside the judicial mind: exploring judicial
methodology in the mixed legal system of Quebec.
European J. Comparative Law and Governance
14. Kahneman, D. Thinking, Fast and Slow. Farrar, Straus
and Giroux, 2011.
15. Mussweiler, T. and Strack, F. Numeric judgments
under uncertainty: the role of knowledge in anchoring.
J. Experimental Social Psychology 36, 5 (2000),
16. Northcraft, G.B. and Neale, M. A. Experts, amateurs, and
real estate: an anchoring-and-adjustment perspective
on property pricing decisions. Organizational Behavior
and Human Decision Processes 39, 1 (1987), 84–97.
17. Shaw, A.D., Horton, J. J. and Chen, D. L. Designing
incentives for inexpert human raters. In Proceedings
of the ACM Conf. Computer-supported Cooperative
Work. ACM Press, 2011, 275–284.
18. State v Loomis, 2016.
19. Tversky, A. and Kahneman, D. Judgment under
uncertainty: Heuristics and biases. Science 185, 4157
20. Wansink, B., Kent, R.J. and Hoch, S. J. An anchoring
and adjustment model of purchase quantity decisions.
J. Marketing Research 35, 1 (1998), 71.
Michelle Vaccaro received a bachelor’s degree in
computer science in 2019 from Harvard College,
Cambridge, MA, USA.
Jim Waldo is a Gordon McKay Professor of the practice of
computer science at Harvard University, Cambridge, MA,
USA, where he is also a professor of technology policy at
the Harvard Kennedy School. Prior to joining Harvard, he
spent more than 30 years in the industry, much of that at
Copyright held by authors/owners.
Publications rights licensed to ACM.
susceptible to forms of cognitive bias
such as anchoring.
These findings also, importantly,
highlight problems with existing
frameworks to address machine
bias. For example, many researchers advocate for putting a “human in
the loop” to act in a supervisory capacity, and they claim this measure
will improve accuracy and, in the
context of risk assessments, “ensure
a sentence is just and reasonable.”
Even when humans make the final
decisions, however, the machine-learning models exert influence by
anchoring these decisions. An algorithm’s output still shapes the ultimate treatment for defendants.
The subtle influence of algorithms
via this type of cognitive bias may
extend to other domains such as finance, hiring, and medicine. Future
work should, no doubt, focus on the
collaborative potential of humans
and machines, as well as steps to promote algorithmic fairness. But this
work must consider the susceptibility of humans when developing measures to address the shortcomings of
machine learning models.
The COMPAS algorithm was used here
as a case study to investigate the role
of algorithmic risk assessments in human decision-making. Prior work on
the COMPAS algorithm and similar
risk-assessment instruments focused
on the technical aspects of the tools by
presenting methods to improve their
accuracy and theorizing frameworks
to evaluate the fairness of their predictions. The research has not considered
the practical function of the algorithm
as a decision-making aid rather than as
a decision maker.
Based on the theoretical findings
from the existing literature, some
policymakers and software engineers
contend that algorithmic risk assessments such as the COMPAS software
can alleviate the incarceration epidemic and the occurrence of violent
crimes by informing and improving
decisions about policing, treatment,
The first experiment described here
thus explored how the COMPAS algorithm affects accuracy in a controlled
environment with human subjects.
When predicting the risk that a defendant will recidivate, the COMPAS
algorithm achieved a significantly
higher accuracy rate than the participants who assessed defendant profiles ( 65.0% vs. 54.2%). Yet when participants incorporated the algorithm’s
risk assessments into their decisions,
their accuracy did not improve. The
experiment also evaluated the effect of
presenting an advisement designed to
warn of the potential for disparate impact on minorities. The findings suggest, however, that the advisement did
not significantly impact the accuracy of
Moreover, researchers have increasingly devoted attention to the
fairness of risk-assessment software.
While many people acknowledge
the potential for algorithmic bias in
these tools, they contend that leaving a human in the loop can ensure
fair treatment for defendants. The
results from the second experiment,
however, indicate that the algorithmic risk scores acted as anchors that
induced a cognitive bias: Participants assimilated their predictions
to the algorithm’s score. Participants
who viewed the set of low-risk scores
provided risk scores, on average,
42.3% lower than participants who
viewed the high-risk scores when assessing the same set of defendants.
Given this human susceptibility, an
inaccurate algorithm may still result
in erroneous decisions.
Considered in tandem, these findings indicate that collaboration between humans and machines does
not necessarily lead to better outcomes, and human supervision does
not sufficiently address problems
when algorithms err or demonstrate
concerning biases. If machines are
to improve outcomes in the criminal
justice system and beyond, future research must further investigate their
practical role: an input to human decision makers.
The Mythos of Model Interpretability
Zachary C. Lipton
The API Performance Contract
Robert F. Sproull and Jim Waldo