be wrongly incarcerated, while white
defendants were more likely to be set
free but nevertheless recidivate. Northpointe (now Equivant), the company
behind the risk assessment, countered
that its tool was equally accurate in
predicting recidivism for black and
white defendants. Since then, computer scientists and statisticians have
debated the different qualities that an
intuitive sense of fairness might imply:
that a risk score is equally accurate in
predicting the likelihood of recidivism
for members of different racial groups;
that members of different groups have
the same chance of being wrongly predicted to recidivate; or that failure to
predict recidivism happens at the same
rate across groups. While each of these
expectations of a fair score might seem
like complementary requirements, recent work has established that satisfying all three at the same time would be
impossible in most situations; meeting two will mean failing to comply
with the third. 4, 7 Even if Northpointe
had been more sensitive to disparities
in the false positive and false negative
rates, the appropriate way to handle
such a situation may not have been obvious. Favoring certain fairness properties over others could just as well have
reflected a difference in values, rather
than a failure to recognize the values
at stake. One thing is for certain: this
use of data science has prompted a
vigorous debate, making clear that our
normative commitments are not well
articulated, that fuzzy values will be
difficult to resolve computationally,
and that existing ethical frameworks
may not deliver clear answers to data
Toward a Constructive
The critical writing on data science has
taken the paradoxical position of insisting that normative issues pervade
all work with data while leaving unaddressed the issue of data scientists’
ethical agency. Critics need to consider how data scientists learn to think
about and handle these trade-offs,
while practicing data scientists need
to be more forthcoming about all of
the small choices that shape their decisions and systems.
Technical actors are often far more
sophisticated than critics at under-
standing the limits of their analysis. In
many ways, the work of data scientists
is a qualitative practice: they are called
upon to parse an amorphous problem,
wrangle a messy collection of data,
and make it amenable to systematic
analysis. To do this work well, they
must constantly struggle to under-
stand the contours and the limitations
of both the data and their analysis.
Practitioners want their analysis to be
accurate and they are deeply troubled
by the limits of tests of validity, the
problems with reproducibility, and the
shortcomings of their methods.
Many data scientists are also deeply
disturbed by those who are coming
into the field without rigorous train-
ing and those who are playing into the
hype by promising analyses that are
not technically or socially responsible.
In this way, they should serve as allies
with critics. Both see a need for nuanc-
es within the field. Unfortunately, uni-
versalizing critiques may undermine
critics’ opportunities to work with data
scientists to meaningfully address
some of the most urgent problems.
Of course, even if data scientists
take care in their work and seek to
engage critics, they may not be well
prepared to consider the full range of
ethical issues that such work raises. In
truth, few people are. Our research sug-
gests the informal networks that data
scientists rely on are fallible, incom-
plete, and insufficient, and that this
is often frustrating for data scientists
In order to bridge the socio-techni-
cal gap that Ackerman warned about
20 years ago, data scientists and critics
need to learn to appreciate each oth-
er’s knowledge, practices, and limits.
Unfortunately, there are few places in
are often far more
than critics at
limits of their analysis.
which such learning can occur. Many
data scientists feel as though critics
only talk at them. When we asked one
informant why he did not try to talk
back, he explained that social scien-
tists and humanists were taught to
debate and that he was not. Critics get
rewarded for speaking out publicly, he
said, garnering rewards for writing es-
says addressed to a general audience.
This was not his skillset nor recognized
as productive by his peers.
The gaps between data scientists
and critics are wide, but critique divorced from practice only increases
them. Data scientists, as the ones closest to the work, are often the best positioned to address ethical concerns, but
they often need help from those who
are willing to take time to understand
what they are doing and the challenges
of their practice. We must work collectively to make the deliberation that
is already a crucial part of data science
visible. Doing so will reveal far more
common ground between data scientists and their critics and provide a
meaningful foundation from which to
articulate shared values.
1. Ackerman, M.S. The intellectual challenge of CSCW:
The gap between social requirements and technical
feasibility. Human-Computer Interaction 15, ( 2–3),
2. Angwin, J. et al. Machine bias. ProPublica. (May 23,
3. Collins, H.M. Artificial Experts: Social Knowledge and
Intelligent Machines. MIT Press, 1993.
4. Corbett-Davies, S. et al. A computer program used
for bail and sentencing decisions was labeled biased
against blacks. It’s actually not that clear. Washington
Post (Oct. 17, 2016); https://www.washingtonpost.
5. Feldman, M. et al. Certifying and removing disparate
impact. In Proceedings of the 21st ACM SIGKDD
International Conference on Knowledge Discovery and
Data Mining, (2015), 259–268.
6. Hess, D.J. Editor’s Introduction, Studying Those Who
Study Us: An Anthropologist in the World of Artificial
Intelligence. Stanford University Press, 2001.
7. Kleinberg, J., Mullainathan, S. and Raghavan, S.
Inherent Trade-Offs in the Fair Determination
of Risk Scores. Arxiv.org. 2016; https://arxiv.org/
8. O’Neil, C. Weapons of Math Destruction: How Big Data
Increases Inequality and Threatens Democracy.
9. Žliobaite', I. and Custers, B. Using sensitive personal
data may be necessary for avoiding discrimination in
data-driven decision models. Artificial Intelligence and
Law 24, 2 (Feb. 2016), 183–201.
Solon Barocas ( firstname.lastname@example.org) is an Assistant
Professor of Information Science at Cornell University.
danah boyd ( email@example.com) is a Principal
Researcher at Microsoft Research and the Founder/
President of Data & Society.
Copyright held by authors.