researchers and most risky to subjects.
For example, a recent study by
Hauge et al. 5 used geographic profiling techniques and public datasets to
(allegedly) identify the pseudonymous
artist Banksy. The study underwent
ethics review, and was (likely) permitted because it used public datasets,
despite its intense focus on the private
information of individual subjects. 5
This discrepancy is made possible by
the anachronistic assumption that
any informational harm has already
been done by a public dataset. That the
NPRM explicitly cites this assumption
as a justification to a priori exclude increasingly prominent big data research
methods is highly problematic.
Perhaps academic researchers
should have relaxed access to maintain parity with industry or further
scientific knowledge. But the Common Rule should not allow that de
facto under the guise of empirically
weak claims about the risks posed by
public datasets. The Common Rule
might rightfully exclude big data research methods from its purview,
but it should do so explicitly and not
muddle attempts to moderate the
risks posed by declaring public data
inherently low risk.
Exempt—An Expanded Category
The NPRM also proposes to expand
the Exempt category (minimal review largely conducted through an
online portal) to include secondary
research using datasets containing
identifiable information collected
for non-research purposes. All such
research would be exempt as long
as subjects were given prior notice
and the datasets are to be used only
in the fashion identified by the re-questor (§__. 104(e)( 2)). The NPRM
does not propose to set a minimum
bar for adequate notice. This can be
reasonable given the high standard
of informed consent is intended primarily for medical research, and can
be an unreasonable burden in the social sciences. However, to default to
end user license agreements (EULA)
poses too low a bar. Setting new rules
for the exempt category should not
be a de facto settlement of this open
debate. Explicit guidelines and processes for future inquiry and revised
regulations are warranted.
The NPRM improves the Common
Rule’s application to big data research, but portions of the NPRM with
consequences for big data research
rest on dated assumptions. The contentious history of the Common Rule
is due in part to its influence on the
tone and agenda of research ethics
even outside of its formal purview.
This rare opportunity for significant
revisions should not cement problematic assumptions into the discourse of
ethics in big data research.
1. boyd, d. and Crawford, K. Critical questions for big
data. Information, Communication & Society 15, 5
2. Committee on Revisions to the Common Rule for the
Protection of, Board on Behavioral, Cognitive, and
Sensory Sciences, Committee on National Statistics,
et al. Proposed Revisions to the Common Rule for
the Protection of Human Subjects in the Behavioral
and Social Sciences, 2014; http://www.nap.edu/
3. Department of Health and Human Services Code of
Federal Regulations Title 45—Public Welfare, Part
46—Protection of Human Subjects. 45 Code of Federal
Regulations 46, 2009; http://www.hhs.gov/ohrp/
4. Department of Health and Human Services. Notice
of Proposed Rule Making: Federal Policy for the
Protection of Human Subjects. Federal Register,
5. Hauge, M.V. et al. Tagging Banksy: Using geographic
profiling to investigate a modern art mystery. Journal
of Spatial Science (2016): 1–6.
6. King, J.L. Humans in computing: Growing responsibilities
for researchers. Commun. 58, 3 (Mar. 2015), 31–33.
7. Kitchin, R. Big data, new epistemologies and paradigm
shifts. Big Data & Society 1, 1 (2014).
8. Kramer, A., Guillory, J., and Hancock, J. Experimental
evidence of massive-scale emotional contagion through
social networks. In Proceedings of the National
Academy of Sciences 111, 24 (2014), 8788–8790.
9. Metcalf, J. Letter on Proposed Changes to the
Common Rule. Council for Big Data, Ethics, and Society
10. Metcalf, J. and Crawford, K. Where are human
subjects in big data research? The emerging ethics
divide. Big Data & Society 3, 1 (2016), 1–14.
11. Meyer, M.N. Two cheers for corporate experimentation:
The a/b illusion and the virtues of data-driven innovation.
Colorado Technology Law Journal 13, 273 (2015).
12. National Commission for the Protection of Human
Subjects, of Biomedical and Behavioral Research
and The National Commission for the Protection of
Human Subjects (1979) The Belmont Report: Ethical
Principles and Guidelines for the Protection of Human
Subjects of Research; http://www.hhs.gov/ohrp/
13. Zwitter, A. Big data ethics. Big Data & Society 1, 2 (2014).
Jacob Metcalf ( firstname.lastname@example.org) is a Researcher
at the Data & Society Research Institute, and Founding
Partner at the ethics consulting firm Ethical Resolve.
This work is supported in part by National Science
Foundation award #1413864. See J. Metcalf “Letter on
Proposed Changes to the Common Rule. Council for Big
Data, Ethics, and Society (2016)” 9 for the public comment
on revisions to the Common Rule published collectively by
the Council for Big Data, Ethics and Society. This column
represents only the author’s opinion.
Copyright held by author.
the subjects, the investigator does not
contact the subjects, and the investigator will not re-identify subjects or otherwise conduct an analysis that could
lead to creating individually identifiable
private information. (§__. 101(b)( 2)(ii)) 4
These types of research in the context of big data present different risk
profiles depending on the contents and
what is done with the dataset. Yet they
are excluded based on the assumption
that their status (public, private, pre-existing, de-identified, and so forth)
is an adequate proxy for risk. The proposal to create an excluded category
is driven by frustrations of social and
other scientists who use data already
in the public sphere or in the hands of
corporations to whom users turn over
mountains of useful data. Notably, social scientists have pushed to define
“public datasets” such that it includes
datasets that can be purchased. 2 The
power and peril of big data research is
that large datasets can theoretically be
correlated with other large datasets in
novel contexts to produce unforeseeable insights. Algorithms might find
unexpected correlations and generate predictions as a possible source of
poorly understood harms. Exclusion
would eliminate ethical review to address such risks.
Public and private are used in the
NPRM in ways that leave this regulatory
gap open. “Public” modifies “datasets,”
describing access or availability. “
Private” modifies “information” or “data”
describing a reasonable subject’s expectations about sensitivity. Yet
publicly available datasets containing private
data are among the most interesting to
history of the
Common Rule is due
in part to its influence
on the tone and
agenda of research
ethics even outside
of its formal purview.