analyst with a rich set of transformations to apply to the data
set before differentially private aggregations.
Definition 2. We say a transformation T is c-stable if for any
two input data sets A and B,
Transformations with bounded stability propagate differential privacy guarantees made of their outputs back to
their inputs, scaled by their stability constant.
Theorem 2. Let M provide -differential privacy, and let T
be an arbitrary c-stable transformation. The composite computation M ° T provides ( × c)-differential privacy.
Once stability bounds are established for a set of transformations, a nonexpert analyst can combine any number
of them as they see fit. Differential privacy bounds result
from repeated application of Theorem 2, compounding the
stability constants of the applied transformations with the
value of the final aggregation.
example transformations: To give a sense for the types of
stability bounds to expect, we consider a few representative
transformations from LINQ.
The Where transformation takes an arbitrary predicate
over records, and results in the subset of records satisfying
the predicate. Any records in difference between A and B will
result in at most those records in difference between their
restrictions, resulting in a stability of one. Importantly, this
is true independent of the supplied predicate; the predicate’s logic can use arbitrarily sensitive information in the
records and will still have stability one.
The GroupBy transformation takes a key selection function, mapping records to some key type, and results in a set
of groups, one for each observed key, where each group contains the records mapped to the associated key value. For
every record in difference between A and B, a group in the
output can change. A change corresponds to symmetric difference two, not one; despite the apparent similarities in the
groups, subsequent logic (e.g., a Where) can treat the two
groups as arbitrarily different. As with Where, the stability
of two holds for any key selection function, including those
based on very sensitive fields or functions thereof.
The Union transformation takes a second data set, and
results in the set of elements in either the first or the second
data set. A record in difference between A and B results in no
more than one record in difference in the output, yielding
stability one. This is also true for records in difference in the
second data set, giving us an example of a binary transformation. A differentially private analysis of the result of a binary
transformation reflects information about both sources.
This is uncomplicated unless the inputs derive from common data. Even so, a single change to a data set in common
induces a bounded change in each of the transformation’s
inputs, and a bounded change in its output (i.e., the stability
The Join transformation takes a second data set, and
key selection functions for both data sets. It intends to
result in a set of pairs of records, one from each input, of
records whose keys match. A single record in either set could
match an unbounded number of records in the other set.
Consequently, this important transformation has no stability bound. As we discuss later, there are restricted forms
of Join that do have bounded stability (stability one, with
respect to both inputs), but their semantics deviate from the
unrestricted Join present in LINQ.
2. 4. Composition
Any sequence of differentially private computations also
provides differential privacy. Importantly, this is true even
when subsequent computations can depend arbitrarily on
the outcomes of the preceding computations.
Theorem 3. Let Mi each provide i-differential privacy. The
sequence of Mi(X) provides (Si i ) -differential privacy.
This simple theorem indicates that to track the cumulative privacy implications of several analyses, we need
not do anything more complicated than add the privacy
If the queries are applied to disjoint subsets of the input
domain we can improve the bound to the worst of the privacy guarantees, rather than the sum.
Theorem 4. Let Mi each provide -differential privacy.
Let Di be arbitrary disjoint subsets of the input domain D. The
sequence of Mi (X Ç Di ) provides -differential privacy.
Whereas sequential composition is critical for any
functional privacy platform, parallel composition is a very
important part of extracting good performance from a privacy platform. Although such operations could be analyzed
as sequential composition, the privacy guarantee would
scale with the number of subsets analyzed, often quite
2. 5. a privacy calculus
The mathematics of this section allows us to quantitatively
bound the privacy implications of arbitrary sequences of
rich transformations and aggregations. This simplicity
allows us to avoid burdening the analyst with the responsibility of correctly or completely describing the mathematical features of their query. Even for researchers familiar with
the mathematics (e.g., the author), the reasoning process
can be quite subtle and error-prone. Fortunately, it can be
automated, the subject of Section 3.
3. imPLementinG PinQ
PINQ is built atop C#’s LINQ. LINQ is a recent language
extension to the .NET framework integrating declarative
access to data streams (using a language very much like
SQL) into arbitrary C# programs. Central to LINQ is the
IQueryable<T> type, a generic sequence of records
of type T. An IQueryable admits transformations
such as Where, GroupBy, Union, Join, and more,
returning new IQueryable objects over possibly new
types. Only once an aggregation or enumeration is