[JUVENILE-FELONY COUNT] juvenile
felony charges and [JUVENILE-MISDE-MEANOR COUNT] juvenile misdemeanor
charges on their record.
The descriptive paragraph in the
score treatment added the following
information:
COMPAS is risk-assessment software
that uses machine learning to predict
whether a defendant will commit a crime
within the next two years. The COMPAS
risk score for this defendant is [SCORE
NUMBER]: [SCORE LEVEL].
Finally, the descriptive paragraph in
the disclaimer treatment provided the
following information below the COMPAS score, which mirrored the language the Wisconsin Supreme Court
recommended in State v Loomis:
18
Some studies of COMPAS risk-assessment scores have raised questions
about whether they disproportionately
classify minority offenders as having a
higher risk of recidivism.
Upon seeing each profile, participants were asked to provide their own
risk-assessment scores for the defendant and indicate if they believed the
defendant would commit another
crime within two years. Using drop-down menus, they answered the questions shown in Figure 1.
We deployed the task remotely
through the Qualtrics platform and recruited 225 respondents through Amazon Mechanical Turk, 75 for each treatment group. All workers could view the
task title, “Predicting Crime;” task description, “Answer a survey about predicting crime;” and the key words associated with the task, “survey, research,
and criminal justice.” Only workers
living in the U.S. could complete the
task, and they could do so only once.
During the pilot study among an initial
test group of five individuals, the survey required an average of 15 minutes
to complete. As the length and content
of the survey resembled that of Dressel
and Farid’s,
6 we adopted their payment
scheme, giving workers $1 for completing the task and a $2 bonus if the overall accuracy of the respondent’s predictions exceeded 65%. This payment
structure motivated participants to pay
close attention and provide their best
responses throughout the task.
6, 17
Results. Figure 2 shows the average
accuracy of participants in the control, score, and disclaimer treatments.
defendants.
3, 4, 6 To compare the results
from this experiment with those in prior studies, this study considers only the
subset of defendants who identify as
either African-American (black) or Caucasian (white).
˲ Exclude cannabis crimes.
Interestingly, the pilot study showed participant confusion about cannabis-related
crimes such as possession, purchase,
and delivery. In the free-response section
of the survey, participants made comments such as “Cannabis is fully legal
here.” To avoid confusion about the legality of cannabis in various states, this
study excludes defendants charged with
crimes containing the term cannabis.
From this filtered dataset 40 defendants were randomly sampled. A
profile was generated containing information about the demographics,
alleged crime, criminal history, and
algorithmic risk assessment for each
of the defendants in the sample. The
descriptive paragraph in the control
treatment assumed the following format, which built upon that used in
Dressel and Farid’s study:
6
The defendant is a [RACE] [SEX] aged
[AGE]. They have been charged with:
[CRIME CHARGE]. This crime is classified as a [CRIMINAL DEGREE]. They
have been convicted of [NON-JUVENILE
PRIOR COUNT] prior crimes. They have
Figure 1. Defendant profile from score treatment.
Figure 2. Accuracy rate in treatment groups.
60%
50%
40%
30%
20%
10%
Control Score Disclaimer
0
Acc
u
ra
cy(O
ve
r
a
ll)
Treatment
54% 54%
51%