reasonable de-identification and/or
additional data control techniques as
appropriate; and develop a monitoring, accountability, and breach response plan.
These requirements would be informed by the nascent industry standards under development by NIST and
others, including accepted de-identification and SDL techniques as well
as a consideration of the risk vectors
described here. 2 Of course, those who
engage in unauthorized re-identification are also culpable and it might be
worthwhile to supplement contractual
or statutory obligations not to engage
in re-identification with severe civil (or
even criminal) penalties for intentional violations that cause harm. 3 It is important that any such statutory prohibitions also include robust exemptions
for security research into de-identification and related topics.
A risk-based approach recognizes
there is no perfect anonymity. It focuses on process rather than output. Yet
effective risk-based data release policy
also avoids a ruthless pragmatism by
acknowledging the limits of current
risk projection models and building
in important protections for individual privacy. This policy-driven, integrated, and comprehensive approach
will help us better protect data while
preserving its utility.
1. Cavoukian, A. and El Emam, K. Dispelling the Myths
Surrounding Deidentification: Anonymization Remains
a Strong Tool for Protecting Privacy. Information
and Privacy Commissioner of Ontario, 2011;
2. Garfinkel, S.L. De-Identification of Personal
Information. National Institute of Standards and
Technology, 2015; http://bit.ly/2cz28ge
3. Gellman, R. The deidentification dilemma: A legislative
and contractual proposal. 21 Fordham Intell. Prop.
Media & Ent. L. J. 33, 2010.
4. Hartzog, W. and Solove, D. J. The scope and potential
of FTC data protection. 83 Geo. Washington Law
Review 2230, 2015.
5. Kinney, S.K. et al. Data confidentiality: The next five
years summary and guide to papers. J. Privacy and
Confidentiality 125 (2009).
6. Narayanan, A. and Felten, E. W. No silver bullet:
De-identification still doesn’t work, 2014; http://bit.
7. Narayanan, A. and Shmatikov, V. Robust de-anonymization of large sparse datasets. In
Proceedings of the 2008 29th IEEE Symposium on
Security and Privacy 111.
Woodrow Hartzog ( firstname.lastname@example.org) is a Starnes
Professor of Law with the Cumberland School of Law at
Ira Rubinstein ( email@example.com) is a Senior Fellow
at the Information Law Institute at New York University
School of Law.
Copyright held by author.
protecting the privacy and confidentiality of data subjects. SDL can be thought
of in terms of three major forms of interaction between researchers and personal data: direct access (which covers
access to data by qualified investigators
who must agree to licensing terms and
access datasets securely); dissemina-tion-based access (which includes de-identification), and query-based access
(which includes but is not limited to
differential privacy). 5
Adopting the SDL frame for the de-identification debate helps to clarify
several contested issues in the current
debate. First, the most urgent need
today is not for improved de-identification methods alone but also for
research that provides agencies with
methods and tools for making sound
decisions about SDL. Second, the SDL
literature calls attention to the fact
that researchers in statistics and computer science pursue very different approaches to confidentiality and privacy
and all too often do so in isolation from
one another. They might achieve better
results by collaborating across methodological divides. Third, the legal
scholars who have written most forcefully on this topic tend to evaluate the
pros and cons of de-identification in
isolation from other SDL methods. Debates focusing exclusively on the merits or demerits of de-identification are
incomplete. SDL techniques should be
part of most regulators’ toolkits.
The Way Forward: Minimizing Risk
Most importantly, SDL can be leveraged to move de-identification policy
toward a process of minimizing risk.
A risk-based approach would seek to
tailor SDL techniques and related legal mechanisms to an organization’s
anticipated privacy risks. For example,
if the federal agency administering the
HIPAA Privacy Rule (Health and Human Services) fully embraced a risk-based approach, this would transform
the rule into something more closely
resembling the law of data security. 4
Such an approach would have three
Process-based: Organizations en-
gaged in releasing data to internal,
trusted, or external recipients should
assume responsibility for protecting
data subjects against privacy harms by
imposing technical restrictions on ac-
cess, using adequate de-identification
procedures, and/or relying on query-
based methods, all in combination
with legal mechanisms, as appropriate.
Contextual: Sound methods for
protecting released datasets are al-
ways contingent upon the specific
scenario of the data release. There are
at least seven variables to consider in
any given context, many of which have
been previously identified in reports
by the National Institute of Standards
and Technology (NIST) and others.
They include data volume, data sen-
sitivity, type of data recipient, data
use, data treatment technique, data
access controls, and consent and con-
Tolerant of risk: The field of data
security has long acknowledged there
is no such thing as perfect security. If
the Weld, AOL, and Netflix re-identifi-
cation incidents prove anything, it is
that perfect anonymization also is a
myth. By focusing on process instead
of output, data release policy can aim
to raise the cost of re-identification
and sensitive attribute disclosure to
acceptable levels without having to
ensure perfect anonymization.b
b This Viewpoint is based on a longer article by
the co-authors, which provides a more detailed
discussion of these three factors; see Rubin-
stein, I. and Hartzog, W. Anonymization and
Risk. 91 Washington Law Review 703, 2016.
Limitation can be
leveraged to move
a process of