5. ReLateD WoRK
The analysis of sensitive data under the constraints of confidentiality has been the subject of a substantial amount
of prior research; for an introductory survey we recommend the reader to Adam and Wortmann, 1 but stress that
the field is still very much evolving. For an introduction to
differential privacy we recommend the reader to Dwork. 6
While PINQ is the first platform we are aware of providing differential privacy guarantees, several other interactive data analysis platforms have been proposed as an
approach to providing privacy guarantees. Such platforms
are generally built on the principle that aggregate values
are less sensitive that individual records, but are very aware
that allowing an analyst to define an arbitrary aggregation
is very dangerous. Various and varying criteria are used
to determine which aggregates an analyst should be able
to conduct. To the best of our knowledge, none of these
systems have provided quantifiable end-to-end privacy
Recent interest in differential privacy for interactive
systems appears to have started with Mirkovic, 13 who proposed using differential privacy as a criteria for admitting analyst-defined aggregations. The work defines an
analysis language (targeted at network trace analysis) but
does not go so far as to specify semantics that provide
formal differential privacy guarantees. It seems possible
that PINQ could support much of the proposed language
without much additional work, with further trace-specific
transformations and aggregations added as extensions to
Airavat14 is a recent analogue of PINQ for Map-Reduce
computations. The authors invest much more effort in
hardening the system, securing the computation through
the use of a mandatory access control operating system
and an instrumented java virtual machine, as well as PINQ-style differential privacy mathematics. At the same time,
it seems that the resulting analysis language (one Map-Reduce stage) is less expressive than LINQ. It remains to be
seen to what degree the system level guarantees of Airavat
can be fruitfully hybridized with the language level restriction used in PINQ.
We have presented “Privacy Integrated Queries” (PINQ), a
trustworthy platform for privacy-preserving data analysis.
PINQ provides private access to arbitrarily sensitive data,
without requiring privacy expertise of analysts or providers. The interface and behavior are very much like that
of Language Intergrated Queries (LINQ), and the privacy
guarantees are the unconditional guarantees of differential privacy.
PINQ presents an opportunity to establish a more formal
and transparent basis for privacy technology and research.
PINQ’s contribution is not only that one can write private
programs, but that one can write only private programs.
Algorithms built out of trusted components inherit privacy
properties structurally, and do not require expert analysis
and understanding to safely deploy. This expands the set
of capable users of sensitive data, increases the portability
of privacy-preserving algorithms across data sets and
domains, and broadens the scope of the analysis of sensi-
6. 1. availability
The prototype of PINQ used for the experiments in this
paper, as well as further example programs and a brief
tutorial, are available at http://research.microsoft.com/
The author gratefully acknowledges the contributions of several collaborators. Ilya Mironov, Kobbi Nissim, and Adam
Smith have each expressed substantial interest in and support
for privacy tools and technology usable by nonexperts. Yuan
Yu, Dennis Fetterly, Úlfar Erlingsson, and Mihai Budiu helped
tremendously in educating the author about LINQ, and have
informed the design and implementation of PINQ. Many readers and reviewers have provided comments that have substantially improved the presentation of this paper.
1. Adam, n.r., wortmann, j.c. Security-control methods for statistical
databases: A comparative study,
AcM comput. Surv., 21, 4 (1989),
2. barak, b., chaudhuri, K., dwork, c.,
Kale, S., McSherry, f., talwar, K.
privacy, accuracy, and consistency too:
a holistic solution to contingency table
release, in podS (2007), 273–282.
3. barbaro, M., Zeller jr., t. A face
is exposed for Aol searcher no.
4417749, the New York Times,
August 9, 2006.
4. blum, A., dwork, c., McSherry, f.,
nissim, K. practical privacy: the SulQ
framework, in podS (2005), 128–138.
5. dwork, c. differential privacy, in
IcAlp (2006), 1–12.
6. dwork, c. A firm foundation for private
data analysis, communications of
the AcM, Association for computing
Machinery, Inc., 2010.
7. dwork, c., Kenthapadi, K., McSherry,
f., Mironov, I., naor, M., our data,
ourselves: privacy via distributed noise
generation, in eurocrypt (2006),
8. dwork, c., McSherry, f., nissim,
K., Smith, A. calibrating noise to
sensitivity in private data analysis, in
tcc (2006), 265–284.
9. Isard, M., budiu, M., yu, y., birrell, A.,
fetterly, d. dryad: distributed data-parallel programs from sequential
building blocks, in euroSys. AcM
10. McSherry, f. privacy integrated
queries: an extensible platform for
privacy-preserving data analysis, in
SIgMod conference (2009), 19–30.
11. McSherry, f., talwar, K. Mechanism
design via differential privacy, in focS
12. McSherry, f., talwar, K. Synthetic
data via differential privacy,
13. Mirkovic, j. privacy-safe nework trace
sharing via secure queries, in ndA
14. roy, I., Setty, S.t., Kilzer, A.,
Shmatikov, V., witchel, e. Airavat:
Security and privacy for mapreduce, in
nSdI conference (2010).
15. yu, y., Isard, M., fetterly, d., budiu,
M., erlingsson, u, gunda, p. K.,
currey, j. dryadlInQ: A system for
general-purpose distributed data-parallel computing using a high-level
language, in oSdI (2008).
Frank McSherry (mcsherry@microsoft.
com), Microsoft research, SVc, Mountain