I
M
A
G
E
B
Y
A
L
I
C
I
A
K
U
B
I
S
T
A
/
A
N
D
R
I
J
B
O
R
Y
S
A
S
S
O
C
I
A
T
E
S
such data. For example, unethically obtained data is likely to include personal
information that would otherwise have
been removed or anonymized. Researchers should ensure any information that allows individuals to be identified is removed when they clean the
data for analysis and publication.
Another argument is that since the
data is already publicly accessible, it
can be used the same way as any other
publicly accessible data. 5 The meth-
ods and motives behind the data’s re-
lease are only relevant for evaluating
its quality. Given the likelihood illegal
methods were used to obtain the data,
the source will frequently attempt to
maintain remain their anonymity to
the ethical issues associated with us-
ing such data was inconsistent. 8 While
the ACM Code of Ethics lists ‘avoiding
harm’ as a general principle, 1 the pos-
sible harms of using the data are not
always clear.
There are various kinds of unethically obtained data that might interest computing researchers. The datasets resulting from security breaches
may be password dumps, databases
of internal message boards, financial and personal data, or classified
information. 7 Such information may
be released onto the Internet by
whistleblowers, as a result of deliberate infiltration of a secure network by
outsiders, or an accidental disclosure
caused by weak security practices.
We should distinguish between
data unethically obtained by the researchers themselves, and data unethically obtained and released by third
parties. The first case is straightforward: researchers should always reject
using unethical methods. Institutional
Review Boards (IRBs) and research ethics committees should be consulted if
there are concerns about the methods
of collecting data. Legal privacy protections also limit what researchers can
collect and the methods they can employ. It is less straightforward, though,
if a third party has collected data using unethical (and potentially illegal)
methods and then released it publicly.
Consider a whistleblower releasing
confidential documents that reveal
wrongdoing by governments, companies, or institutions. While it may be
illegal for the whistleblower to release
these documents, it not clear researchers should ignore their contents if they
are of significant research value and
public interest. A blanket prohibition
against using unethically obtained
data may prevent socially beneficial research from occurring. The arguments
for and against using unethically obtained data should therefore be considered for each individual case.
Justifying Using Exposed Data
The most straightforward justifica-
tion for using data exposed by a secu-
rity breach is the potential benefits to
society from utilizing that data out-
weigh the harms caused by obtaining
it. The researchers must therefore of-
fer a compelling justification for how
using the data will benefit society. If
the data describes illegal or harmful
activity, the obvious justification is
that research using it may be used to
prevent or limit such activity in the
future. This recognizes the means
used to obtain the data were wrong
but defends the researchers’ use of it
as a means to prevent or reduce anoth-
er form of wrongdoing.
This argument’s effectiveness depends on both the seriousness of the
unethical methods used to obtain the
data and the potential significance of
the research’s benefits to society. The
researchers should also attempt to
minimize any further harm that may
occur from publishing research using
BREACHED
DATA