sifted through anonymized data of 20
million searches performed by 658,000
America Online subscribers. The researchers were able to cull sensitive
information—including Social Security
numbers, credit card numbers, addresses, and personal habits—by looking at
all the searches of a single user (each
user received a single randomized number). These same identification methods
can be used for social networking sites
and to parse through data contained in
search engines, Dwork says.
The repercussions are enormous. For
example, in the 1990s, a health insurance company that provided coverage
for all state employees in the Commonwealth of Massachusetts released general data about the medical histories of
anonymized individuals for general research purposes. Only the date of birth,
gender, and ZIP code of residence was
left in the data. However, a researcher,
Latanya Sweeney, now an associate professor of computer science at Carnegie
Mellon University, identified the medical history for William Weld, then the
governor of Massachusetts. This was
possible because the database contained only six people who had his same
date of birth, only three of them were
men, and Weld was the only person in
his five-digit ZIP code.
What’s remarkable, Dwork says, is
that each data element alone isn’t a
privacy risk. “Most people would probably say, ‘No big deal.’ Yet, putting these
three elements together is enough to
identify approximately two-thirds of the
population,” says Dwork.
Preserving Privacy
As government agencies, research institutes, companies, and nonprofit orga-
Differential privacy
appears to be the only
approach that offers
a solid and well-
defined method for
achieving privacy—
without making any
assumptions about
the adversaries’
strategy.
nizations search for ways to boost the
value of their data, the pressure to develop better privacy-protecting methods and systems is increasing. “Despite
good intentions and software tools designed to thwart breaches, breakdowns
continue to take place,” says Frank McSherry, a researcher at Microsoft Research Silicon Valley.
Privacy-preserving efforts have undergone a steady evolution during
the last quarter-century. Statistics, security, cryptography, and databases
have all emerged as topics of interest.
However, actual solutions have remained elusive, largely because there’s
no way to guarantee data privacy with
ad-hoc tools and methods. Cryptography, for example, is fine for protecting
data from a security standpoint, but it
does nothing to mitigate data mining
and sophisticated analysis of publicly
released or anonymized data. In fact,
mathematically rigorous methods
have demonstrated that the 25-year-
old concept of “semantic security”
cannot be achieved for statistical
databases.
Differential privacy, which first
emerged in 2006 (though its roots go
back to 2001), could provide the tipping point for real change. By introducing random noise and ensuring that a
database behaves the same—
independent of whether any individual or small
group is included or excluded from the
data set, thus making it impossible to
tell which data set was used—it’s possible to prevent personal data from
being compromised or misused. Pennsylvania State University’s Smith says
that differential privacy can be applied
to numerous environments and settings. “It creates a guideline for defining whether something is acceptable
or not,” he says.
Government, academic, and business leaders have shown some interest in differential privacy, although the
concept is still in the early stages of development and implementation. Currently, differential privacy appears to be
the only approach that offers a solid and
well-defined method for achieving privacy—without making any assumptions
about the adversaries’ strategy. “There
has been a lot of positive feedback about
the concept, though it is clearly on the
upward slope,” McSherry says. “We believe that with further analysis, testing,
and tweaking, differential privacy could
emerge within the next several years as
the gold standard for privacy.”
Samuel Greengard is a freelance writer based in West
Linn, OR.
Computer Science
In Memoriam: Randy Pausch
To most of his followers around
the world, Randy Pausch was
known as the terminally ill college
professor who taught millions
how to live. His You Tube-based
“Last Lecture,” given a year ago at
Carnegie Mellon University after
doctors told him his pancreatic
cancer had progressed and he
had only months to live, became
an international sensation. His
subsequent best-selling book
of the same title was translated
into over two dozen languages.
To the computing community,
Pausch, who died on July 25 at the
age of 47, will be forever known
as a pioneer in virtual reality. His
leadership in the Alice project
revolutionized how introductory
programming can be taught. As
co-founder of the Entertainment
Technology Center at CMU,
he developed a facility for
bridging computer science with
entertainment technology. Pausch,
an ACM Fellow, was an active leader
in SIGGRAPH and SIGCHI and
a frequent contributor to many
ACM publications; indeed, while
undergoing cancer treatments
he co-authored two articles for
Communications’ special section on
a science of games (July 2007).
On p. 19 we present a Q&A with
Pausch conducted just weeks prior
to his death. Like everything he did
in life, he gave graciously of his time
when it came to discussing computer
science and his regard for students.