tial (Laplace) noise. Many other simple aggregations (e.g.,
Sum, Average, Median, among others) have similarly accurate randomized analogs.
To allow nonexpert analysts to produce new differentially private computations, we introduce the use of transformations, applied to data before differentially private
aggregations, without weakening the differential privacy guarantees. For several relational transformations,
a changed record in the input data set always results in
relatively few changes in the output data set. A differentially private analysis applied to transformed data masks
changes in the transformation’s output, and consequently
masks changes in its input as well. The composed transformation and analysis will provide differential privacy,
with a formal guarantee depending on the quantitative
form of “relatively few,” which we must determine for each
transformation. Such transformations can be composed
arbitrarily, by nonexpert analysts, and combined with
differentially private aggregations will serve as our query
Finally, any sequence of differentially private computations also provides differential privacy; the quantitative
privacy depletions are at worst additive (and occasionally
better), and can be tracked online. Consequently, we can
embed the query language above into any general purpose
programming language, allowing arbitrary use of the results
that return from the queries, as long as we monitor and constrain the total privacy depletion.
implementation of PinQ: We have implemented a prototype of PINQ based on C#’s Language Integrated Queries
(LINQ), an SQL-like declarative query language extension
to .NET languages. Data providers use PINQ to wrap arbitrary LINQ data sources with a specified differential privacy allotment for each analyst. Analysts then write arbitrary
C# programs, writing queries over PINQ data sources almost as if they were using unprotected LINQ data sources.
PINQ’s restricted language and run-time checks ensure
that the provider’s differential privacy requirements are
respected, no matter how an analyst uses these protected
At a high level, PINQ allows the analyst to compose arbitrary queries over the source data, whose quantitative differential privacy guarantees are evaluated before the query
is executed. If the analyst has framed a query whose privacy
cost falls within the bounds prescribed by the data providers, the query is executed and the privacy cost subtracted
from the amount available to the analyst for the associated
data sets. If the cost falls outside the bounds, PINQ does not
execute the query.
PINQ is designed as a thin layer in front of an existing query engine (Figure 1); it does not manage data or
execute queries. Instead, it supplies differentially private
implementations of common transformations and aggregations, themselves written in LINQ and executed by the
LINQ providers of the underlying data sets. This approach
substantially simplifies our implementation, but also allows
a large degree of flexibility in its deployment. A data source
only needs a LINQ interface to support PINQ, and we can
take advantage of any investment and engineering put in to
figure 1. PinQ provides a thin protective layer in front of existing
data sources, presenting an interface that appears to be that of the
raw data itself.
performant LINQ providers.
We stress that PINQ represents a very modest code base;
in its current implementation it is only 613 lines of C# code.
The assessment logic, following the math, is uncomplicated.
The aggregations must be carefully implemented to provide
differential privacy, but these are most often only a matter
of postprocessing the correct aggregate (e.g., adding noise).
PINQ must also ensure that the submitted queries conform to our mathematical model for them. LINQ achieves
substantial power by allowing general C# computations in
predicates of Where, functions of Select, and other operations. PINQ must restrict and shepherd these computations
to mitigate the potential for exploitation of side channels.
applications of PinQ: Programming with PINQ is done
through the declarative LINQ language, in an otherwise
unconstrained C# program. The analyst is not given direct
access to the underlying data; instead, information is extracted via PINQ’s aggregation methods. In exchange for
this indirection, the analyst’s code is allowed to operate on
unmasked, unaltered, live records.
With a few important exceptions, programs written with
PINQ look almost identical to their counterparts in LINQ.
The analysts assemble an arbitrary query from permitted transformations, and specify the accuracy for aggregations. Example 1 contains a C# PINQ fragment for counting
distinct IP addresses issuing searches for an input query
Example 1. Counting searches from distinct users in
We will develop this example into a more complex search
log visualization application showcasing several of PINQ’s
var data = new PINQueryable<SearchRecord>(... ...);
var users = from record in data
where record.Query == argv
Console.WriteLine(argv + “:” + users.Count(0.1));
advantages over other approaches: rich data types, complex
transformations, and integration into higher level applications, among many others. The full application is under 100
lines of code and took less than a day to write.
We have written several other examples of data analyses in PINQ, including k-means clustering, perceptron