possible, and compelling public interest justifications exist for analyzing it.
In these cases, the burden of proof to
explain the public interest in analyzing unethically obtained data is the responsibility of the researchers. IRBs
and ethics committees should assist
researchers in determining whether a
compelling benefit to society justifies
using such data.
Given the risks of using data from
a security breach (both from the uncertainty of the data quality and the
risks of causing further harm), researchers must ensure how they use
such data minimizes these risks. The
general ethical principles 1. 2 (“Avoid
harm”) and 1. 6 (“Respect privacy”)
of the ACM Code of Ethics apply to
these cases, as they do to all computing research. 1 Published research
using data from a security breach
should include an ethics section
where the researchers present their
justifications for using the data and
which IRBs and/or ethics committees reviewed and approved it. 7 IRBs
and ethics committees should also be
aware of the specific concerns raised
by unethically gathered data. Whether the data used is publicly accessible
or not should not determine on its
own whether its use poses a minimal
risk to those described within it.
1. ACM Code of Ethics and Professional Conduct.
Association for Computing Machinery. ACM, New York,
N Y, USA; 2018; https://www.acm.org/code-of-ethics.
2. Douglas, D.M. Should Internet researchers use ill-gotten information? Science and Engineering Ethics
24, 4 (Aug. 2018), 1221–1240; https://doi.org/10.1007/
3. Metcalf, J. Big data analytics and revision of the
common rule. Commun. ACM 59, 7 (July 2016), 31–33;
4. Metcalf, J. and Crawford, K. Where are human
subjects in big data research? The emerging ethics
divide. Big Data and Society 3, 1 (June 1, 2016);
5. Poor, N. and Davidson, R. Case study: The ethics of
using hacked data: Patreon’s data hack and academic
data standards. Council for Big Data, Ethics, and
Society (Apr. 6, 2016); http://bit.ly/2NnkscM.
6. Rosenbaum, A.S. The use of Nazi medical experimentation
data: Memorial or betrayal? International Journal of
Applied Philosophy 4, 4 (Apr. 1989), 59–67.
7. Thomas, D.R. et al. Ethical issues in research using
datasets of illicit origin. In Proceedings of the 2017
Internet Measurement Conference, 445–462.
IMC ’ 17. ACM, New York, NY, USA; https://doi.
8. U.S. Department of Health and Human Services. 45
CFR 46. (July 19, 2018); http://bit.ly/2NianOq.
9. World Medical Association. Declaration of Helsinki—
Ethical Principles for Medical Research Involving
Human Subjects. (Oct. 19, 2013); http://bit.ly/2lJFEQw.
David M. Douglas ( firstname.lastname@example.org) is a Brisbane,
Australia-based researcher in computer ethics.
Copyright held by author.
avoid punishment. The source therefore is unaccountable for the data’s
quality. This creates the possibility that
the data may have been altered or falsified for their own purpose.
This uncertainty about the data’s
authenticity justifies at least performing a preliminary analysis to determine
whether it is genuine. Since appeals
to the public benefit of using data exposed by a security breach depend on
the data’s accuracy, the researchers
must explain how they established the
data’s authenticity and the likelihood
it is genuine. Even if the researchers refuse to use the data in their own work,
establishing whether it is likely to be
genuine is useful for confirming a security breach has occurred.
However, the fact the data from a
security breach is publicly accessible
does not mean using it in research
does not create additional risks to
those it describes. For instance, publicly available information may be
used to harass or threaten individuals. Jacob Metcalf rightly states that
the risk to individuals from using
research data depends more on the
dataset’s contents and the research’s
usage of it rather than whether the
data is public, private, or deano-nymized. 3 This holds for both legitimately acquired and unethically obtained data. While this might appear
to downplay the significance of how
the data was obtained and released,
the uncertainty about data quality
imposes an additional burden on using unethically obtained data. This
burden is itself a potential reason to
avoid using such data.
Against Using Exposed Data
The major arguments against using
data exposed by a security breach are:
˲ the unethical methods used to obtain the data ‘taints’ both the data itself
and any research using it as immoral;
˲ using such data grants the methods
and those who used them an unacceptable legitimacy as ‘researchers’; and
˲ refusing to use such data is an important statement about conducting
research ethically and deters researchers from using such methods to obtain
data in the future. 2
The claim that unethically ob-
tained data ‘taints’ research using it
is both symbolic and methodologi-
cal: using the data symbolically re-
enacts the harm caused by obtaining
it, and the unethical means of gather-
ing it also suggests the data may be of
poor quality. 2 For victims of security
breaches, research using the exposed
data may reinforce the feelings of
violation and humiliation from when
they discovered their data had been
exposed. Methodologically, since the
researchers were not involved in col-
lecting the data, they also must con-
firm the data is genuine and has not
been manipulated. This is part of the
burden imposed by having to authen-
ticate and clean unethically obtained
data mentioned previously.
Part of the unease associated with
using unethically obtained data is that
it implies the researchers themselves
condone the methods used to obtain
it. While this is unlikely, it requires researchers to explicitly distance themselves from those who performed the
security breach in publications using
this data. While there may be legitimate reasons for conducting research
with such data, researchers must take
care to ensure the readers of their published findings are clear they do not endorse or condone the methods used to
obtain the data.
An even stronger rejection of security breaches as a data source is
refusing to use it in research. Such a
refusal makes a clear statement about
the proper methods of conducting
research and obtaining data. It also
deters future researchers from using
such data as it means the research
community will shun their work.
However, adopting this position risks
neglecting valuable data that may
otherwise be inaccessible. If a security breach exposes data about illegal
activity, this data might be useful for
gaining a better understanding of how
to combat it.
Handle with Caution
There are a few general conclusions to be
derived from this summary of the arguments for and against using data exposed by a security breach. The risks to
those described in the data and the additional burdens such data imposes
on researchers means data from security breaches should only be used as a
last resort. However, there are cases
where obtaining data ethically is im-