APRIL 2017 | VOL. 60 | NO. 4 | COMMUNICATIONS OF THE ACM 89
where the probability space is over the coin flips of the algorithm M. The case when δ = 0 is sometimes referred to as
pure differential privacy, and in this case we may say simply
that M is ε-differentially private.
The concentration bounds we obtain for pure differential
privacy are almost as strong as those given by the standard
Chernoff–Hoeffding concentration inequalities.
Theorem 4. Let M be an ε-differentially private algorithm
that outputs a function from X to [0, 1]. For a random variable
S distributed according to P n we let φ = M(S). Then
This statement also holds more broadly for an important
class of low sensitivity functions. These are functions of a
dataset that for some sensitivity ∆ satisfy: | f (S) – f (S′)| ≤ ∆ for
all datasets S, S′ ∈ X n that differ in a single element. Note
that the sensitivity of the empirical average of any function
with range [0, 1] on a dataset of size n is at most 1/n.
We outline the proof idea of this result in Section 3. 1.
A similar concentration result can also be obtained for (ε, δ)-
differentially private algorithms although it is not quite
as strong and requires a substantially different and
more involved proof. Our result for this case (see Ref.
9)
has been recently strengthened and generalized to low-sensitivity queries using a new proof technique by Bassily
et al.
2
Theorem 4 implies that |P[φ] – ES[φ]| ≤ τ holds with
high probability whenever φ is generated by a differentially private algorithm M. This might appear to be different from what we need in our application because there
the queries are generated by an arbitrary (possibly adver-sarial) adaptive analyst and we only have control over
the query answering algorithm. The connection comes
from a crucial property of differential privacy, known as
its post-processing guarantee: Any algorithm that can be
described as the (possibly randomized) post-processing
of the output of a differentially private algorithm is itself
differentially private (e.g., Ref.
12). Hence, although we do
not know how an arbitrary analyst is adaptively generating her queries, we do know that if the only access she has
to S is through a differentially private algorithm, then her
method of producing query functions must be differentially private with respect to S. We can therefore, without
loss of generality, think of the query answering algorithm
and the analyst as a single algorithm M, that is, given a
random data set S and returns a differentially private output query φ = M(S).
We also note that the bound in Theorem 4 gives the probability of correctness for each individual answer to a query,
meaning that the error probability is for each query and
not for all queries at the same time. The bounds we state in
Theorems 1 and 2 hold with high probability for all m queries and to obtain them from the bounds in this section, we
apply the union bound.
All we are missing now to get an algorithm for answering
adaptively chosen statistical queries is an algorithm that sat-
isfies the following two conditions:
There is a very large body of work designing differentially
private algorithms for various data analysis tasks, some of
which we leverage in our applications (see Ref.
8 for a short
survey and Ref.
12 for a textbook introduction to differential
privacy).
For differentially private algorithms that output a hypothesis it has been known as folklore that differential privacy
implies stability of the hypothesis to replacing (or removing)
an element of the input dataset. Such stability is long known to
imply generalization in expectation (e.g., Ref.
24). Our technique
can be seen as a substantial strengthening of this observation:
from expectation to high probability bounds (which is crucial
for answering many queries), from pure to approximate differential privacy (which is crucial for our improved efficient
algorithms), and beyond the expected error of a hypothesis.
Building on our work, Blum and Hardt5 showed how to
reuse the holdout set to maintain an accurate leaderboard
in a machine learning competition that allows the participants to submit adaptively chosen models in the process of
the competition (such as those organized by Kaggle Inc.).
Finally, in a recent follow-up work, Bassily et al.
2
strengthen the link between generalization and approximate differential privacy quantitatively and extend it to the
more general class of low-sensitivity queries. Their result
leads to bounds on the number of samples that are needed
to guarantee generalization that improve on our theorems
by a factor of
3. DIFFERENTIAL PRIVACY AND GENERALIZATION
Our results rely on a strong connection we make between
differential privacy and generalization. At a high level, we
prove that if M is a differentially private algorithm then the
empirical average of a function that it outputs on a random
dataset will be close to the true expectation of the function
with high probability over the choice of the dataset and the
randomness of M. More formally, for a dataset S = (x1, . . ., xn)
and a function φ : X → [0, 1], let denote the
empirical average of φ. We denote a random dataset chosen
from P n by S. Standard Chernoff–Hoeffding concentration
inequalities for sums of independent random variables
imply that for any fixed function φ, the empirical average
ES[φ] is strongly concentrated around its expectation P[φ].
However, this statement is no longer true if φ is allowed to
depend on S (i.e., what happens if we choose functions adaptively, using previous estimates on S). However, for a hypothesis output by a differentially private M on S (denoted by
φ = M(S)), we show that ES[φ] is close to P[φ] with high
probability. Before making our statements formal we review
the definition of differential privacy.
11
Definition 3. A randomized algorithm M with domain X n
is (ε, δ)-differentially private if for all O ⊆ Range(M) and for
all pairs of datasets S, S′ ∈ X n that differ in a single element: