sifted through anonymized data of 20 million searches performed by 658,000 America Online subscribers. The researchers were able to cull sensitive information—including Social Security numbers, credit card numbers, addresses, and personal habits—by looking at all the searches of a single user (each user received a single randomized number). These same identification methods can be used for social networking sites and to parse through data contained in search engines, Dwork says.
The repercussions are enormous. For example, in the 1990s, a health insurance company that provided coverage for all state employees in the Commonwealth of Massachusetts released general data about the medical histories of anonymized individuals for general research purposes. Only the date of birth, gender, and ZIP code of residence was left in the data. However, a researcher, Latanya Sweeney, now an associate professor of computer science at Carnegie Mellon University, identified the medical history for William Weld, then the governor of Massachusetts. This was possible because the database contained only six people who had his same date of birth, only three of them were men, and Weld was the only person in his five-digit ZIP code.
What’s remarkable, Dwork says, is that each data element alone isn’t a privacy risk. “Most people would probably say, ‘No big deal.’ Yet, putting these three elements together is enough to identify approximately two-thirds of the population,” says Dwork.
As government agencies, research institutes, companies, and nonprofit orga-
nizations search for ways to boost the value of their data, the pressure to develop better privacy-protecting methods and systems is increasing. “Despite good intentions and software tools designed to thwart breaches, breakdowns continue to take place,” says Frank McSherry, a researcher at Microsoft Research Silicon Valley.
Privacy-preserving efforts have undergone a steady evolution during the last quarter-century. Statistics, security, cryptography, and databases have all emerged as topics of interest. However, actual solutions have remained elusive, largely because there’s no way to guarantee data privacy with ad-hoc tools and methods. Cryptography, for example, is fine for protecting data from a security standpoint, but it does nothing to mitigate data mining and sophisticated analysis of publicly released or anonymized data. In fact,
mathematically rigorous methods have demonstrated that the 25-year- old concept of “semantic security” cannot be achieved for statistical databases.
Differential privacy, which first emerged in 2006 (though its roots go back to 2001), could provide the tipping point for real change. By introducing random noise and ensuring that a database behaves the same— independent of whether any individual or small group is included or excluded from the data set, thus making it impossible to tell which data set was used—it’s possible to prevent personal data from being compromised or misused. Pennsylvania State University’s Smith says that differential privacy can be applied to numerous environments and settings. “It creates a guideline for defining whether something is acceptable or not,” he says.
Government, academic, and business leaders have shown some interest in differential privacy, although the concept is still in the early stages of development and implementation. Currently, differential privacy appears to be the only approach that offers a solid and well-defined method for achieving privacy—without making any assumptions about the adversaries’ strategy. “There has been a lot of positive feedback about the concept, though it is clearly on the upward slope,” McSherry says. “We believe that with further analysis, testing, and tweaking, differential privacy could emerge within the next several years as the gold standard for privacy.”
Samuel Greengard is a freelance writer based in West Linn, OR.
To most of his followers around the world, Randy Pausch was known as the terminally ill college professor who taught millions how to live. His You Tube-based “Last Lecture,” given a year ago at Carnegie Mellon University after doctors told him his pancreatic cancer had progressed and he had only months to live, became
an international sensation. His subsequent best-selling book of the same title was translated into over two dozen languages.
To the computing community, Pausch, who died on July 25 at the age of 47, will be forever known as a pioneer in virtual reality. His leadership in the Alice project revolutionized how introductory
programming can be taught. As
co-founder of the Entertainment
Technology Center at CMU,
he developed a facility for
bridging computer science with
entertainment technology. Pausch,
an ACM Fellow, was an active leader
in SIGGRAPH and SIGCHI and
a frequent contributor to many
ACM publications; indeed, while
undergoing cancer treatments he co-authored two articles for Communications’ special section on a science of games (July 2007).
On p. 19 we present a Q&A with Pausch conducted just weeks prior to his death. Like everything he did in life, he gave graciously of his time when it came to discussing computer science and his regard for students.
References:
Archives