fer, a professor of computer science
at Penn State University, and engineering lead Simson Garfinkel, previously a computer scientist at the
National Institute of Standards and
Technology (NIST). The team is currently working to apply differential
privacy to the Census’ upcoming efforts for 2020.
It is not an easy task.
“We have to do it fast, and we have
to do it well,” says Abowd. Though he
readily admits the tight timeline and
volume of work are heavy burdens, and
these are not the only obstacles.
The community of researchers who
use Census data will be dealing with
data in 2020 that has a new system of
protection applied to it, and not everyone is happy about that.
One outspoken critic is Steven Ruggles, Regents Professor of History and
Population Studies at the University of
Minnesota, and director of the Institute
for Social Research and Data Innovation, which is focused on advancing
“our knowledge of societies and populations across time and space, including
economic and demographic behavior,
health, well-being, and human-environ-ment interactions.” Ruggles regularly
uses Census data in his work, and says
the use of differential privacy could limit the ability of researchers to find useful
insights in that data.
“The fundamental problem is loss
of accuracy of the data,” says Ruggles.
“In the case of tabular small-area
data, noise injection will blur the re-
sults, potentially leading investiga-
tors and planners to miss patterns in
the data. For example, the noise injec-
tion could lead to underestimation of
Ruggles also does not believe the
implementation of differential privacy
on U.S. Census data is even necessary.
“There has never been a documented
case of anyone’s identity being re-
vealed in a public-use data product, so
it is a huge overreaction.”
Ullman, on the other hand, sees dif-
ferential privacy as the best solution
available to prevent database recon-
struction attacks, while still keeping
the data of the Census usable.
Because the Census has an enor-
mous dataset, Ullman says it is pos-
sible to release huge quantities of
summary statistics with manageable
amounts of noise. Differential pri-
vacy then quantifies how releasing
additional summary statistics will
increase privacy risks, making it pos-
sible to “weigh the harm to privacy
against the public benefits in a sen-
“There is simply no competing
framework right now that has the po-
tential to offer all of these benefits,”
Dwork, C., McSherry, F., Nissim, K., and Smith, A.
(2006) Calibrating Noise to Sensitivity in
Private Data Analysis. In: Halevi S., Rabin
T. (eds) Theory of Cryptography. TCC 2006.
Lecture Notes in Computer Science, vol
3876. Springer, Berlin, Heidelberg
To Reduce Privacy Risks, the Census Plans
to Report Less Accurate Data, The New York
Times, Dec. 5, 2018
Garfinkel, S., Abowd, J., and Martindale, C.
Understanding Database Reconstruction
Attacks on Public Data, ACM Queue,
Nov. 28, 2018
Logan Kugler is a freelance technology writer based
in Tampa, FL, USA. He has written for over 60 major
© 2019 ACM 0001-0782/19/7 $15.00
base, while also leaving the data sufficiently useful for researchers.
Differential privacy can make sure
you’re drawing the right balance between noise in your data and the usefulness of your data. Researchers Cynthia Dwork, Frank McSherry, Kobbi
Nissim, and Adam Smith presented a
paper at the 2006 Theory of Cryptography Conference, “Calibrating Noise
to Sensitivity in Private Data Analysis,” showing how to set up a mathematical system that allows parametric
control over a risk that can be quantified, while formalizing the amount of
noise needed to be added to protect
the data and proposing a generalized
mechanism for doing so.
“It was specifically designed to provide mathematical assurances that
you had controlled the risk of database reconstruction, specifically that
you controlled the potential harm
from re-identification caused by an attacker building too accurate an external image of your data,” says Abowd.
This is why differential privacy was
picked by the Census Bureau to defend
“It’s a mathematical framework
for understanding what ‘ensuring
privacy’ means,” says Ullman. “The
framework was specifically tailored
to understanding how to protect pri-
vacy in statistical analysis of large da-
tasets, which is exactly the problem
the Census faces.”
Abowd began experimenting with
differential privacy frameworks in
2008 as part of other work for the
Census Bureau, which produces a
number of data products aside from
the Census itself. However, it wasn’t
until 2016, after he conducted a da-
tabase reconstruction attack on past
Census data, that the need to use dif-
ferential privacy on all Census data
Census Bureau management
agreed with Abowd that differential
privacy was the solution to the problem, so Abowd and a team of computer scientists and engineers got to
work implementing it.
Balancing Privacy with Usability
Abowd put together a team of computer scientists and engineers in
short order to combat the threat. The
team includes science lead Dan Ki-
how to protect
privacy in statistical
analysis of large
databases, which is
exactly the problem
the Census faces.”