I
M
A
G
E
B
Y
M
I
K
H
A
I
L
L
E
O
N
O
V
,
C
O
L
O
R
I
Z
E
D
B
Y
A
N
D
R
I
J
B
O
R
Y
S
A
S
S
O
C
I
A
T
E
S
Our work, consisting of two experiments, therefore first explores the
influence of algorithmic risk assessments on human decision making and
finds that providing the algorithm’s
predictions does not significantly affect human assessments of recidivism.
The follow-up experiment, however,
demonstrates that algorithmic risk
scores act as anchors that induce a
cognitive bias: If we change the risk
prediction made by the algorithm, participants assimilate their predictions
to the algorithm’s score.
The results highlight potential
shortcomings with the existing hu-
man-in-the-loop frameworks. On the
one hand, when algorithms and hu-
mans make sufficiently similar deci-
sions their collaboration does not
achieve improved outcomes. On the
other hand, when algorithms fail, hu-
mans may not be able to compensate
for their errors. Even if algorithms do
not officially make decisions, they an-
chor human decisions in serious ways.
Experiment One: Human-Algorithm
Similarity, not Complementarity
The first experiment examines the impact of the COMPAS algorithm on human judgments concerning the risk
of recidivism. COMPAS risk scores
were used because of the data available on that system, its widespread
usage in prior work about algorithmic
fairness, and the use of the system in
numerous states.
Methods. The experiment entailed
a 1 x 3 between-subjects design with
the following treatments: control, in
which participants see only the defen-
dant profiles; score, in which partici-
pants see the defendant profiles and
the defendant COMPAS scores; and
disclaimer, in which participants see
the defendant profiles, the defendant
COMPAS scores, and a written advise-
ment about the COMPAS algorithm.
Participants evaluated a sequence of
defendant profiles that included data
on gender, race, age, criminal charge,
and criminal history. These profiles
described real people arrested in Bro-
ward County, FL, based on information
from the dataset that ProPublica used
in its analysis of risk-assessment algo-
rithms.
1 While this dataset originally
contained 7,214 entries, this study ap-
plied the following filters before sam-
pling for 40 profiles that were present-
ed to participants:
˲ Limit to black and white defendants.
Prior work on the accuracy and fairness of the COMPAS algorithm limits their analyses to white and black