A Modern Solution
to a Modern Problem
By law, the U.S. Census Bureau is prohibited from identifying “the data
furnished by any particular establishment or individual.” That is why the
Census Bureau publishes summary
data, or a high-level view of the sex,
age, race, and other household details
of Americans by state.
The main data product that comes
out of the Census is Summary File
1, which constitutes the “main dissemination of census results,” says
Abowd. Summary File 1 contains a lot
of data that demographers use, like
age, race, and ethnicity segmented by
gender, as well as household composition statistics.
According to the Census Bureau,
Summary File 1 “includes population
and housing characteristics for the
total population, population totals
IN 2020, THE people of the U.S. will stand up and be counted, according to the provisions in the U.S. Constitution that stipu- late a census may take place every decade. It’s a tradition dating back
to 1790, when the first national census
This tradition is turning to a newer
technique to stay secure in the 21st century.
Back in 2003, researchers Irit Dinur
and Kobbi Nissim of the NEC Research
Institute published a paper explaining how they had identified theoretical vulnerabilities in the summary data
published with confidential databases.
In some cases, the researchers found,
the summary data—a high-level picture
of the data from individual records in a
database—could be used to reconstruct
the private database. That meant attackers could use the public summary
of the data to reconstruct what people
had disclosed privately.
On paper, these types of database
reconstruction attacks presented a
possible threat to confidential databases that published summary data.
The U.S. Census is a prime example of
such a database.
For a long time, the paper remained a
warning about a theoretical threat; until
the last decade, when a dramatic increase in both computer speed and the
efficiency of NP-hard problem solvers
turned the theoretical threat into a practical peril, according to research published by U.S. Census Bureau employees.
One of those employees, John
Abowd, associate director for research
and methodology at the Bureau, worked
with a team to investigate whether advances in computing power could enable database reconstruction attacks on
the U.S. Census.
The results were shocking.
Abowd and his team retroactively
used database reconstruction tech-
niques on these public data sum-
maries, and found they could use
advanced computational power and
techniques to recreate private data that
was never meant to be public.
In fact, Abowd and his team found
they could reconstruct all the records
contained in the database with ap-
proximately 50% accuracy. When they
allowed a small error in the age of an
individual, the accuracy with which
they could associate public data with
individuals went up to 70%. And if they
allowed getting one piece of personal
information like race or age wrong, but
everything else right, their reconstruc-
tion was more than 90% accurate.
“The vulnerability is not a theoreti-
cal one; it’s an actual issue. The sys-
tems being used [for the census] were
vulnerable,” says Abowd.
The solution, it turns out, was just
as modern as the problem.
the 2020 Census
A new framework is being used to secure the 2020 U.S. Census
from database reconstruction attacks.
Society | DOI: 10.1145/3329719 Logan Kugler