attributes that constitute PII. For example, the Data Protection Directive
defines personal data as: “any information relating to an […] natural person
[…] who can be identified, directly or
indirectly, in particular by reference
[…] to one or more factors specific to
his physical, physiological, mental,
economic, cultural, or social identity.”
ILLuS TRATIon by JoHn HERSEy
The Directive goes on to say that
“account should be taken of all the
means likely reasonably to be used either by the controllera or by any other
person to identify the said person.”
Similarly, the HIPAA Privacy Rule defines individually identifiable health
information as information “ 1) That
identifies the individual; or 2) With
respect to which there is a reasonable
basis to believe the information can be
used to identify the individual.” What
is “reasonable”? This is left open to
interpretation by case law. We are not
aware of any court decisions that define identifiability in the context of
a The individual or organization responsible for
the safekeeping of personal information.
HIPAA.b The “safe harbor” provision
of the Privacy Rule enumerates 18 specific identifiers that must be removed
prior to data release, but the list is not
intended to be comprehensive.
Pii and Privacy Protection
technologies
Many companies that collect personal
information, including social networks, retailers, and service providers,
assure customers that their information will be released only in a “
non-personally identifiable” form. The underlying assumption is that “personally
identifiable information” is a fixed set
of attributes such as names and contact
information. Once data records have
been “de-identified,” they magically
become safe to release, with no way of
linking them back to individuals.
The natural approach to privacy pro-
b When the Supreme Court of Iceland struck
down an act authorizing a centralized database
of “non-personally identifiable” health data, its
ruling included factors such as education, pro-
fession, and specification of a particular medi-
cal condition as part of “identifiability.”
tection is to consider both the data and
its proposed use(s) and to ask: What
risk does an individual face if her data
is used in a particular way? Unfortu-
nately, existing privacy technologies
such as k-anonymity6 focus instead on
the data alone. Motivated by an attack
in which hospital discharge records
were re-identified by joiningc them via
common demographic attributes with
a public voter database, 5 these meth-
ods aim to make joins with external da-
tasets harder by anonymizing the iden-
tifying attributes. They fundamentally
rely on the fallacious distinction be-
tween “identifying” and “non-identify-
ing” attributes. This distinction might
have made sense in the context of the
original attack, but is increasingly
meaningless as the amount and variety
of publicly available information about
individuals grows exponentially.