I
M
A
G
E
B
Y
A
L
I
C
I
A
K
U
B
I
S
T
A
/
A
N
D
R
I
J
B
O
R
Y
S
A
S
S
O
C
I
A
T
E
S
this makes it very difficult for policy
makers to judge whether the HIPAA
de-identification rules should be maintained, reformed, or abandoned.
These divergent views might lead
us to different regulatory approaches. Those that focus on the remote
possibility of re-identification might
prefer an approach that reserves punishment only in the rare instance of
harm, such as a negligence or strict liability regime revolving around harm
triggers. Critics of anonymization
might suggest we abandon de-identi-fication-based approaches altogether,
in favor of different privacy protections focused on collection, use, and
disclosure that draw from the Fair Information Practice Principles, often
called the FIPPs.
These problems with the de-identi-
fication debate are frustrating sound
data use policy. But there is a way for-
ward. Regulators should incorporate
the full gamut of Statistical Disclosure
Limitation (SDL) methods and tech-
niques into privacy law and policy,
rather than relying almost exclusively
on de-identification techniques that
only modify and obfuscate data. SDL
comprises the principles and tech-
niques that researchers have developed
for disseminating official statistics and
other data for research purposes while
projects. (It also heightens consumer
mistrust of e-commerce firms offering
their own dubious “guarantees” of an-
onymization, thereby reinforcing the
“privacy is dead” meme.)
The community of computer scientists, statisticians, and epidemiologists
who write about de-identification and
re-identification are deeply divided,
not only in how they view the implications of the auxiliary information problem, but in their goals, methods, interests, and measures of success. Indeed,
we have found that the experts fall into
two distinct camps. First, there are
those who may be categorized as “
pragmatists” based on their familiarity with
and everyday use of de-identification
methods and the value they place on
practical solutions for sharing useful data to advance the public good. 1
Second, there are those who might be
called “formalists” because of their
insistence on mathematical rigor in
defining privacy, modeling adversar-
ies, and quantifying the probability of
re-identification. 6 Pragmatists devote a
great deal of effort to devising methods
for measuring and managing the risk
of re-identification for clinical trials
and other specific disclosure scenari-
os. Unlike their formalist adversaries,
they consider it difficult to gain access
to auxiliary information and conse-
quently give little weight to attacks
demonstrating that data subjects are
distinguishable and unique but that
(mostly) fail to re-identify anyone on
an individual basis. Rather, they argue
that empirical studies and meta-analy-
ses show that the risk of re-identifica-
tion in properly de-identified datasets
is, in fact, very low.
Formalists, on the other hand, argue that efforts to quantify the efficacy
of de-identification “are unscientific
and promote a false sense of security by
assuming unrealistic, artificially constrained models of what an adversary
might do.” 6 Unlike the pragmatists,
they take very seriously proof-of-con-cept demonstrations of re-identification, while minimizing the importance
of empirical studies showing low rates
of re-identification in practice.
This split among the experts is
concerning for several reasons. Pragmatists and formalists represent distinctive disciplines with very different
histories, questions, methods, and objectives. Accordingly, they have shown
little inclination to engage in fruitful
dialogue much less to join together
and find ways to resolve their differences or place de-identification on firmer
foundations that would eliminate or at
least reduce the skepticism and uncertainty that currently surrounds it. And