The privacy
challenge is
that sensitive
information can be
inferred in many
ways from the data
releases.
data D, and the result M(D) is published. Given knowledge of M, a clever
adversary can design an attack algorithm A and run it on the published
data to obtain the result A(M(D)).
Note A(M(D)) is the result of applying the composite algorithm A ° M to
the data D. Since ε-differential privacy
is closed under post-processing, the
composite algorithm A ° M still satisfies ε-differential privacy and hence
has the same semantics; the output of
A ° M is barely affected by the presence or absence of Bob’s (or any other
individual’s) record in the database.
The second important consequence
of closure under post-processing is
how a data curator must express privacy definitions. Consider k-anonymity
and ε-differential privacy. By analogy
to database-query languages, the definition of k-anonymity is declarative;
that is, it specifies what we want from
the output but not how to produce this
output. On the other hand, differential privacy is more procedural, specifying constraints on the input/output behaviors of algorithms through
constraints on probabilities (such as
P(M(D) = ω)). This is no coincidence;
in order to achieve closure under post-processing, it is necessary that the
privacy definition impose conditions
on the probabilities (even when M is
deterministic) rather than on the syntactic form of the outputs. 22
Composition. We introduce the
concept of composition with an example. Suppose the 4-anonymous table in
Figure 1 was generated from data from
Hospital A, while the 3-anonymous
table in Figure 1 was generated by Hospital B. Suppose Alice knows her neighbor Bob was treated by both hospitals
for the same condition. What can Alice
infer about, say, Bob’s private records?
Bob corresponds to an anonymized
record in each table. By matching ZIP
code, age, and disease, Alice can deduce that Bob must have had a stroke.
Each anonymized table individually
might have afforded Bob some privacy,
but the combination of the two tables
together resulted in a privacy breach.
The degradation in privacy that results
from combining multiple sanitized
outputs is known as “composition.” 14
Self-composition. “Self-composi-
tion” refers to the scenario where the
sanitized outputs are all produced
from privacy mechanisms that satisfy
the same privacy definition. Funda-
mental limits on a privacy definition’s
ability to withstand composition are
part of a growing literature inspired
by the results of Dinur and Nissim7
who showed that the vast majority of
records in a database of size n can be
reconstructed when n log(n) 2 statisti-
cal queries are answered, even if each
answer has been arbitrarily altered to
have up to o( n ) error; that is, a distor-
tion that is less than the natural varia-
tion of query answers that an adversary
would get from collecting a sample of
size n from a much larger population.
Despite such negative results that
limit the number of times a private
database can be queried safely, there
can be a graceful degradation of pri-
vacy protections, as in the case of
ε-differential privacy. If M1, M2, …, Mk
are algorithms such that each Mi sat-
isfies εi-differential privacy, then the
combination of their sensitive out-
puts satisfies ε-differential privacy
with ε = ε + … + ε; 30 more formally,
this privacy level is achieved by the
algorithm M running mechanisms
M1, M2, …, Mk on the input data and
releases all their outputs. The end re-
sult thus does not reveal any record
deterministically while still satisfying
differential privacy but with a linear
degradation in the privacy parameter.
Self-composition has another prac-
tical benefit—simplifying the design
of privacy-preserving algorithms. Com-
plicated mechanisms can be built
modularly from simpler mechanisms
in the same way software is built from
functions. By controlling the informa-
tion leakage of each component indi-
vidually, a privacy-mechanism design-
er can control the information leakage
of the entire system. In the case of
ε-differential privacy, the privacy pa-
rameter ε of the final mechanism is
at most the sum of the privacy param-
eters of its components. 4, 30
Composition with other mechanisms.
The data curator must also consider
the effect on privacy when the mechanisms do not satisfy the same privacy
definition. As an example, 24, 26
consider a database where each record
takes one of k values. Let x1, x2,…, xk
denote the number of times each of
these values appears in the database;
they are histogram counts. Let M1 be