a buyer can access the dataset by asking arbitrary “statistical queries,” and
can run any learning algorithm in the
“SQ model” to derive the answers to
other queries. 5 Can the large literature on SQ learning be used to give
arbitrage-free pricings for more general notions of arbitrage?
3. Seller profit: Arbitrage-free pricings give a family of dominant-strat-egy truthful mechanisms for selling
information. Suppose we know something about the distribution of buyer
demands—can we find the arbitrage-free pricing that maximizes seller
profit? This is particularly intriguing,
because it seems the seller can sometimes increase her profit by selling
noisier queries, thereby reducing the
complementarities among the goods
she is selling. The opportunity to increase profit by degrading product
quality rarely arises in markets for
This paper opens a rich research
direction. I recommend that new Ph.D.
students (or anyone looking for an
attractive problem) read it.
1. Dwork, C., McSherry, F., Nissim, K. and Smith, A.
Calibrating noise to sensitivity in private data analysis.
In Proceedings of the 3rd Theory of Cryptography
Conference, 2006, 265–284.
2. Fleischer, L. and Lyu, Y.-H. Approximately optimal
auctions for selling privacy when costs are correlated
with data. In Proceedings of the ACM Conference on
Electronic Commerce, 2012, 568–-585.
3. Ghosh, A. and Roth, A. Selling privacy at auction.
Games and Economic Behavior (2015), 334–346.
4. Ghosh, A., Ligett, K., Roth, A. and Schoenebeck,
G. Buying private data without verification. In
Proceedings of the ACM Conference on Economics and
Computation, 2014, 931–948.
5. Kearns, M. Efficient noise-tolerant learning from
statistical queries. J. ACM (1998), 983–1006.
6. Nissim, K., Vadhan, S. and Xiao, D. Redrawing the
boundaries on purchasing data from privacy-sensitive
individuals. Innovations in Theoretical Computer
Science (2014), 411–422.
Aaron Roth ( firstname.lastname@example.org) is the Class of 1940
Bicentennial Term Associate Professor at the University of
Pennsylvania, Philadelphia, PA.
Copyright held by author.
SELLING PERSONAL INFORMATION is very
different from selling physical goods,
and raises novel challenges. On the
sell-side of the market, individuals
own their own personal data and experience costs based on the usage of
their data insofar as that usage leads
to future quantifiable harm. On the
buy-side of the market, buyers are interested in “statistical information”
about the dataset, that is, aggregate
information, rather than information derived from a single individual. Differential privacy1 provides
a means to quantify the harm that
can come to individual data owners
as the result of the use of their data.
This ability to quantify harm allows
for data owners to be compensated
for the risk they incur. Past work
studying markets for private data
focused on the simple case in which
the buyer is interested in only the answer to a single linear function of the
data, 2, 3, 4, 6 which makes the buy-side of
the market particularly simple.
The following paper introduces
a fascinating and complicated issue that arises on the buy-side of the
market when buyers are interested in
multiple linear functions of the same
dataset. Information exhibits complementarities: given some information about a dataset, it is possible to
learn other things about the dataset.
This means that when pricing infor-
mation, there might be opportuni-
ties for arbitrage: rather than directly
buying the answer to the query he is
interested in, the buyer might instead
more cheaply buy a bundle of que-
ries that lets him deduce the answer
he is interested in. The authors give
conditions under which a pricing is
arbitrage free. This is a compelling
condition to ask for: it means that it is
a dominant strategy for arriving buy-
ers to faithfully request the answer to
the query they are interested in, rath-
er than trying to game the system. By
asking for arbitrage-free pricings, the
authors are making the market safe
Reasoning about these arbitrage
opportunities can be complicated:
if the values of the purchased linear
functions were revealed exactly, then
the answer to any other query in the
span of the purchased queries would
be derivable. But to guarantee the
sellers differential privacy, it is necessary to sell only noisy estimates of
the data. This makes reasoning about
what is derivable complex. Sensibly, since they are introducing a new
problem, the authors opt to study a
restricted notion of derivability and
arbitrage. They give pricings that rule
out arbitrage opportunities when the
buyer is only allowed to learn by taking linear combinations of observed
queries, is interested only in unbiased estimates of query values, and
will attempt arbitrage only at the level
of one query at a time. Because of the
richness of the authors’ problem, one
of the most exciting aspects of this
work is the doors it opens for future
exploration. Here, I will highlight
what I think are the most interesting
problems coming out of this paper:
1. Multi-query arbitrage: The paper
gives query pricings such that a buyer
can never more cheaply derive the
answer to a single query by buying a
different bundle of queries. However,
the buyer can still sometimes more
cheaply derive the answer to one
bundle of queries by buying the answers
to another bundle! Which pricings
can prevent this?
2. Arbitrage for biased estimators:
When buyers are only interested in
unbiased estimates, the best linear
unbiased estimator is always given by
least-squares linear regression. However, buyers can often improve the accuracy of their derivations by trading
off a small amount of bias for a large
reduction in variance. The space of
optimal estimators is much more
complex when they do so. In general,
(and Its Implications)
By Aaron Roth
To view the accompanying paper,