practice
IN 1997, IBM’S Deep Blue software beat the World
Chess Champion Garry Kasparov in a series of six
matches. Since then, other programs have beaten
human players in games ranging from “Jeopardy!” to
Go. Inspired by his loss, Kasparov decided in 2005 to
test the success of Human+AI pairs in an online chess
tournament.
2 He found the Human+AI team bested
the solo human. More surprisingly, he also found
the Human+AI team bested the solo computer, even
though the machine outperformed humans.
Researchers explain this phenomenon by emphasizing that humans and
machines excel in different dimensions of intelligence.
9 Human chess
players do well with long-term chess
strategies, but they perform poorly
at assessing the millions of possible
configurations of pieces. The opposite
holds for machines. Because of these
differences, combining human and
machine intelligence produces better
outcomes than when each works separately. People also view this form of collaboration between humans and machines as a possible way to mitigate the
problems of bias in machine learning,
a problem that has taken center stage
in recent months.
12
We decided to investigate this type
of collaboration between humans and
machines using risk-assessment algorithms as a case study. In particular,
we looked at the Correctional Offender
Management Profiling for Alternative
Sanctions (COMPAS) algorithm, a well-known (perhaps infamous) risk-predic-tion system, and its effect on human
decisions about risk. Many state courts
use algorithms such as COMPAS to predict defendants’ risk of recidivism, and
these results inform bail, sentencing,
and parole decisions.
Prior work on risk-assessment algorithms has focused on their accuracy
and fairness, but it has not addressed
their interactions with human decision makers who serve as the final arbitrators. In one study from 2018, Julia
Dressel and Hany Farid compared risk
assessments from the COMPAS software and Amazon Mechanical Turk
workers, and found that the algorithm
and the humans achieved similar
levels of accuracy and fairness.
6 This
study signals an important shift in the
literature on risk-assessment instruments by incorporating human subjects to contextualize the accuracy and
fairness of the algorithms. Dressel and
Farid’s study, however, divorces the
human decision makers and the algorithm when, in fact, the current model
indicates that humans and algorithms
would work in tandem.
The Effects
of Mixing
Machine
Learning
and Human
Judgment
DOI: 10.1145/3359338
Article development led by
queue.acm.org
Collaboration between humans and machines
does not necessarily lead to better outcomes.
BY MICHELLE VACCARO AND JIM WALDO