invoked is any computation performed; until this point
the IQueryable only records the structure of the query
and its data sources.
PINQ’s implementation centers on a PINQueryable<T>
generic type, wrapped around an underlying
IQueryable<T>. This type supports the same methods
as an IQueryable, but with implementations ensuring
that the appropriate privacy calculations are conducted
before any execution is invoked. Each PINQueryable is
comprised of a private member data set (an IQueryable),
and a second new data type, a PINQAgent, responsible for
accepting or rejecting requested increments to epsilon.
Aggregations test the associated PINQAgent to confirm
that the increment to epsilon is acceptable before they
execute. Transformations result in new PINQueryable
objects with a transformed data source and a new
PINQAgent, containing transformation-appropriate logic
to forward modified epsilon requests to the agents of its
source PINQueryable data sets.
The PINQAgent interface has one method,
Alert(epsilon), invoked before executing any differentially private aggregation with the appropriate value of epsilon, to confirm access. For PINQueryable objects wrapped
around raw data sets, the PINQAgent is implemented by
the data provider based on its privacy requirements, either
from scratch or using one of several defaults (e.g., decrementing a per-analyst budget). For objects resulting from
transformations of other PINQueryable data sets, PINQ
constructs a PINQAgent which queries the PINQAgent
objects of the transformation’s inputs with transformation-appropriate scaled values of epsilon. These queries are be
forwarded recursively, with appropriate values of epsilon,
until all source data have been consulted. The process is
sketched in Figure 3.
aggregation only if the eventual response is positive.
Count is implemented as per Theorem 1, returning
the accurate count of the underlying data plus Laplace
noise whose magnitude is specified by the analyst, if large
enough. Example 2 depicts the implementation of Count.
Example 2. [Abbreviated] Implementation of Count.
PINQ includes other aggregations—including Sum,
Average, and Median among others—each of which takes
epsilon and a function converting each record to a double.
To provide differential privacy, the resulting values are first
double Count(double epsilon)
if (epsilon > 0.0 && myagent.Alert(epsilon))
return mysource.Count() + Laplace( 1.0/epsilon);
clamped to the interval [− 1, + 1] before they are aggregated.
This is important to ensure that a single record has only a
limited impact on the aggregate, allowing a relatively small
perturbation to provide differential privacy.
The implementations of these methods and the proofs
of their privacy guarantees are largely prior work. Sum, like
Count, is implemented via the addition of Laplace noise
and is discussed in Dwork et al. 8 Average and Median
are implemented using the exponential mechanism
of McSherry and Talwar, 11 and output values in the range
[− 1, + 1] with probabilities
3. 1. aggregation operators
Each aggregation in PINQ takes epsilon as a parameter and provides -differential privacy with respect to its
immediate data source. The privacy implications may be
far worse for the underlying data sets from which this data
set derives, and it is important to confirm the appropriately
scaled amount of differentially private access. Before execution, each aggregation invokes the Alert method of their
associated PINQAgent with this epsilon, conducting the
figure 3. PinQ control/data flow. an analyst initiates a request to
a PinQ object, whose agent (a) confirms, recursively, differentially
private access. once approved by the providers’ agents, data
(D) flows back through trusted code ensuring the appropriate level
of differential privacy.
Each downweights the probability of a possible output x by
(the exponentiation of) the fewest modifications to the input
A needed to make x the correct answer.
The accuracy of Average is roughly 2/ divided by the
number of records in the data set. Median results in a value
which partitions the input records into two sets whose sizes
differ by roughly an additive 2/; it need not be numerically
close to the actual median.
3. 2. transformation operators
PINQ’s flexibility derives from its transformation operators,
each of which results in a new PINQueryable wrapped
around an updated data source. The associated PINQAgent
is wired to forward requests on to the participating source
data sets before accepting, scaling epsilon by the transformation’s stability constant.
Our implementations of many transformations are
mostly a matter of constructing new PINQueryable and
PINQAgent objects with the appropriate parameters.
Some care is taken to restrict computations, as discussed in
Section 3. 4. Example 3 depicts the implementation of PINQ’s
GroupBy. Most transformations require similarly simple privacy logic.