micro-payments to the data owner and each micro-payment
compensates the users according to their privacy loss.
An interesting open question is whether we can achieve
both truthfulness (e.g. as discussed in Ghosh and Roth12)
and arbitrage-freeness (as discussed in the current paper)
when pricing private data. Further, it remains to consider
general notions of answerability that go beyond linear
answerability, or to bound the impact non-linear estimation
methods could have in the context of arbitrage.
assumptions limit the class of adversaries against which the
arbitrage-free property is guaranteed to hold.
A more general approach would be to relax Definition 5 to
allow queries to be answered using either biased or unbiased
estimators and to include answerability using non-linear
functions. This would provide the market maker with a
stronger guarantee against arbitrage. However it would likely
make reasoning about determinacy and arbitrage-free pricing significantly more difficult, and it would further restrict
the set of arbitrage-free pricing functions available to the
market maker. In other words, a more powerful notion of
arbitrage would lead to more restrictive pricing functions,
potentially limiting the ability of the market maker to set
prices. The tradeoff between completeness of the framework
and the feasibility of analyzing arbitrage is an important
topic for future research.
A second issue is the problem of incentivizing the data
owner to participate in the database and truthfully report her
privacy valuations. Currently, there is nothing stopping the
data owner from quoting an impossibly high price, for even
a tiny loss of her privacy. In other words, she would choose
a contract function W(ε) that is as close to ∞ as possible.
Incentivizing users to report their true valuation is a goal
of mechanism design. This has been studied for private data
only in the restricted case of a single query, and has been
shown to be a difficult task. 10, 12 In Ref. Li et al. 20 we make a
preliminary attempt to address this issue by requiring that
users choose from a limited set of contract functions (e.g.
one appropriate for a risk-tolerant user and one appropriate
for a risk-averse user).
A third issue is the protection of privacy valuations themselves. When a user has sufficient freedom to choose their
privacy valuation, it may be strongly correlated with the data
xi itself. In that case, even releasing the price of a query may
lead to privacy loss, a factor not considered in the framework
described above. Hiding the valuation itself is a difficult
problem which is still being actively investigated in mechanism design. 10, 12, 23 In, Ref. Li et al. 20 we describe a simple
approach that is based on perturbing the price itself, in the
same way that we perturb the data. Thus both π(Q) and µi(Q)
are perturbed in the same fashion as query answers, and are
therefore random variables. All three properties of arbitrage-freeness, cost-recovery, and compensation are then defined
in terms of expected values. In addition, the privacy loss for
data item xi includes two parts: one is due to the release of the
query answer and the other is due to the release of the price.
We have introduced a framework for selling private data.
Buyers can purchase any linear query, with any amount of
perturbation, and need to pay accordingly. Data owners, in
turn, are compensated according to the privacy loss they
incur for each query. Buyers are allowed to ask an arbitrary
number of queries, and we have designed techniques for
ensuring that the prices are arbitrage-free, according to
a specific definition of arbitrage-freeness, meaning that
buyers are guaranteed to pay for any information they may
further extract from the queries. Our pricing framework
is balanced, in the sense that the buyer’s price covers the
1. Acquisti, A., Taylor, C. R., Wagman, L.
The economics of privacy. Journal of
Economic Literature 52, 2 (2016).
2. Balazinska, M., Howe, B., Suciu, D. Data
markets in the cloud: An opportunity
for the database community. PVLDB 4,
12 (2011), 1482–1485.
3. Calo, R. The boundaries of privacy harm.
Indiana Law Journal 86, 3 (2011).
4. Chen, B.-C., Kifer, D., LeFevre, K.,
Machanavajjhala, A. Privacy-preserving
data publishing. In Foundations and
Trends in Databases (2010).
5. Dinur, I., Nissim, K. Revealing
information while preserving privacy.
In PODS (2003), 202–210.
6. Dwork, C. A firm foundation for
private data analysis. Commun. ACM
54, 1 (2011), 86–95.
7. Dwork, C., McSherry, F., Nissim, K.,
Smith, A. Calibrating noise to
sensitivity in private data analysis. In
TCC (2006), 265–284.
8. Dwork, C., Roth, A. The Algorithmic
Foundations of Differential Privacy.
Foundations and Trends in Theoretical
Computer Science (2014) Now
Publishers Inc., Hanover, MA, USA.
9. Erlingsson, U., Pihur, V., Korolova, A.
Rappor: Randomized aggregatable
response. In Computer and
Communications Security (CCS)
10. Fleischer, L., Lyu, Y.-H. Approximately
optimal auctions for selling privacy
when costs are correlated with data.
In ACM Conference on Electronic
Commerce (2012), 568–585.
11. Fung, B.C.M., Wang, K., Chen, R.,
Yu, P.S. Privacy-preserving data
publishing: A survey of recent
developments. ACM Comput. Surv.
42, 4 (2010).
12. Ghosh, A., Roth, A. Selling privacy
at auction. In ACM Conference
on Electronic Commerce (2011),
13. Greenberg, A. Apple’s “differential
privacy” is about collecting your
data—but not your data. Wired (Jun
14. Halevy, A. Y. Answering queries using
views: A survey. VLDB J. 10, 4 (2001),
15. Hardt, M., Ligett, K., McSherry, F. A
simple and practical algorithm for
differentially private data release. In
16. Kifer, D., Abowd, J., Gehrke, J. Vilhuber, L.
Privacy: Theory meets practice on the
map. In ICDE (2008).
17. Koutris, P., Upadhyaya, P., Balazinska, M.,
Howe, B., Suciu, D. Query-based data
pricing. J. ACM 62, 5 (Nov. 2015),
18. Laudon, K. C. Markets and privacy.
Commun. ACM 39, 9 (1996), 92–104.
19. Li, C., Hay, M., Rastogi, V., Miklau, G.,
McGregor, A. Optimizing linear
counting queries under differential
privacy. In PODS (2010).
20. Li, C., Li, D. Y., Miklau, G. Suciu, D. A
theory of pricing private data. ACM
Trans. Database Syst. 39, 4 (Dec.
21. Li, C., Miklau, G. Pricing aggregate
queries in a data marketplace. In
22. Nash, A., Segoufin, L., Vianu, V.
Views and queries: Determinacy and
rewriting. TODS 35, 3 (2010).
23. Nissim, K., Vadhan, S., Xiao, D.
Redrawing the boundaries on
purchasing data from privacy-sensitive individuals. In Conference
on Innovations in Theoretical
Computer Science (2014), 411–422.
Chao Li ( firstname.lastname@example.org), Google Inc.
Gerome Miklau ( email@example.com),
University of Massachusetts, Amherst, MA.
Daniel Yang Li and Dan Suciu
University of Washington, Seattle, WA.
Permission to make digital or hard copies of all or part of this work for personal
or classroom use is granted without fee provided that copies are not made
or distributed for profit or commercial advantage and that copies bear this
notice and the full citation on the first page. To copy otherwise, to republish,
to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee.
©ACM 0001-0782/17/1200 $15.00.