Society | DOI: 10.1145/1378727.1378734
Samuel Greengard
Privacy matters
As concerns about protecting personal data increase,
differential privacy offers a promising solution.
OPEN A NEWSPAPER or a Web
browser and you’re certain
to encounter a spate of stories about the misuse or loss
of data and how it puts personal information at risk. Over the last
decade, as computers and databases
have grown ever more sophisticated,
privacy concerns have moved to center stage. Today, government agencies
worry about keeping highly sensitive
financial and health data private. Corporations fret over protecting customer
records. And the public grows ever more
wary—and distrustful—of organizations
that handle sensitive data.
“Privacy issues aren’t about to go
away,” observes Adam Smith, an assistant professor in the computer science and engineering department at
the Pennsylvania State University. “One
problem we face is that ‘privacy’ is an
overloaded term. It means different
things to different people and a lot of issues hinge on context. As a result, it is extremely difficult to create effective solutions and protections—and to gain the
trust that is necessary for respondents
to answer sensitive questions honestly.”
Some 220 million private records
have been lost or stolen in the United
States since January 2005, according
to the Privacy Rights Clearinghouse, a
San Diego, CA-based organization that
tracks privacy issues. While no worldwide statistics exist, it’s entirely apparent that a tangle of regulations, la ws, and
best practices cannot solve the problem.
Worse, increasingly sophisticated tools
make it possible to piece information together and glean details and facts about
people in a way that wasn’t imaginable a
few years ago.
Now, a handful of researchers, mathematicians, and computer scientists are
hoping to alter the landscape and frame
the debate in new and important ways.
Introducing a concept that has been
dubbed “differential privacy,” these data
experts are seeking to use mathematical
equations and algorithms to standard-
ize the way computers—and organizations—protect personal data while
revealing overall statistical trends. The
goal, says Cynthia Dwork, a principal
researcher at Microsoft, is to ensure that
an adversary cannot compromise data
when he or she combines the released
statistics with other external sources of
information. “It’s an extremely attractive approach,” she says.
connecting the Dots
The ability to collect and analyze vast
data sets offers substantial promise.
Sifting through medical data, genotype
and phenotype connections, epidemiological statistics, and their correlation
with events such as chemical spills or
dietary and exercise patterns can help
dictate public policy and find preventive
strategies and cures for real people with
real afflictions.
Yet, protecting privacy is an increasingly tricky proposition and one that
confounds a growing number of organizations. Beyond the widely publicized
hacker attacks and security lapses,
there’s an escalating threat of a person or
organization assembling enough pieces
of seemingly benign data—sometimes
from different sources—to create a useful snapshot of a person or group. Kobbi
Nissim, an assistant professor of computer science at Ben-Gurion University,
describes this approach as “connecting
the dots.” Oftentimes, it involves culling
seemingly unrelated data from diverse
and disparate sources.
It’s not an abstract concept. When
online movie rental firm Netflix decided
to improve its recommendation system
in 2007, executives emphasized that they
would provide complete customer anonymity to participants. Netflix designed
a system that retained the date of each
movie rating along with the title and year
of its release. And it assigned randomized numbers in place of customer IDs.
This seemed like a perfect system
until a pair of researchers—graduate
student Arvind Narayanan and professor Vitaly Shmatikov, both from the department of computer sciences at the
University of Texas at Austin—proved
that it was possible to identify individuals among a half-million participants by
using public reviews published in the Internet Movie Database (IMDb) to identify movie ratings within Netflix’s data. In
fact, eight ratings along with dates were
enough to provide 99% accuracy, according to the researchers.
This type of privacy violation—
known as a linkage attack (attackers use
innocuous data in one data set to identify a record in a second data set with
both innocuous and sensitive data)—
has serious repercussions, Dwork says.
It could identify someone who is gay or
has an interest in extremely violent or
pornographic films. Such information
might potentially interfere with a person’s employment or affect his or her
ability to rent an apartment or belong
to a religious organization. “It could
result in public humiliation,” says
Dwork, who notes that “the conclusion may be wrong. Partners share accounts. People buy gifts, and they may
have some other reason for renting or
buying certain movies.”
It’s not the first time such an event
has taken place. In 2006, researchers