In creating a multimedia travelogue of my recent trip to Europe, I checked in to the most interesting places I visited using the location- based service Foursquare [ 1]. Launched in March 2009 and serving more than 50 millionusers, Foursquare is a social networkingsite that lets you bookmark (i.e.,“checkin” at) venues based on your geographiclocation.
As with all such services, the moreinformation about you available to thesystem, the better the interactions.
However, as much as I love it,Foursquare suffers from data sparsityand the closed world problem forme—I’m an irregular user, so theservice knows very little about me.
For example, in Cambridge I checked
in to the River Cam, a river I rowed
on frequently once upon a time, and
Foursquare excitedly exclaimed, “Your
Not so, Foursquare. Not my first
river. I concede it is perhaps the first
river we have shared together.
While this error is charming and
amusing, information-poor user models
can be dangerous. More trivially,
they are a waste of our attentional
resources, distracting us with irrelevant
content. In the world of product
recommendation, this manifests most
irritatingly in recommendations for
things we already own or would never
purchase. Enough experiences like
these with a service and one is likely
to feel bemused at best, frustrated at
worst. Users have a low threshold for
how many poor experiences they are
willing to endure before a service loses
its allure. Such reduced engagement
negatively impacts service viability
from a business perspective. Thus, users
and services have the same goal—to
improve inference and recommendation
quality. And that requires data, not
just what I did data (behavioral and
transaction logs) but also why I did it
(motivation) data and other things I’d
like to do/explore (aspiration) data.
Internet services that offer
recommendations usually rely on “big
data” and machine-learning algorithms
to crunch the data to find patterns and
make inferences and predictions. These
are in some sense user models. But many
such “user models” behind information
targeting are not really focused on us as
individual users, as people or persons.
This is their power (these generic
user models scale well) and also their
weakness (none of us is entirely generic).
These techniques fail in the face of
data sparsity—without enough data
to crunch on, there are no conclusions
(or poor ones), no recommendations (or
bad ones). This leads service providers
to break down data silos by buying and
selling data, entering into data-sharing
agreements, and tracking users beyond
their virtual walls using cookies and
device identifiers [ 2, 3]. Even then, the
algorithms are not able to get at the
why I did it part, and they seldom ask
the more important questions about
the recommendations that result, such
as: Did we get it wrong? And if so, how
wrong? They are neither conversational
nor collaborative. They aren’t
While such practices could beconstrued as benign when it comes togenerating targeted recommendationsfor products like soap powder, in morepolitically sensitive or safety-criticalsituations, our hackles are rightlyraised. It isn’t just about what may befound out; it is also that errors based onpartial, incomplete, or erroneous datacan lead to unwarranted conclusionswith potentially serious negativeconsequences [ 4].
Yet we, as consumers who couldprovide the necessary informationfor higher-quality inferences andpredictions, are neither inclined norinvited to work on creating trulypersonal data models. We don’t get tocurate the data about us, to manageour data. There are limited efforts inthis regard in narrow domains (movieratings being the most well known). Andthere are user profiles that we are oftenasked to fill out. However, personally,I feel disinclined to fill out profiles andcomply with providing more data aboutmyself. Why? Because when I havedone so, I have not noticed a significantimprovement in service provision,certainly not enough to have made myefforts worthwhile. I don’t know howthat information is used. I also don’tknow what is potentially at stake fromhanding over more information aboutmyself—companies have not, to date,established a great track record forscrupulous behavior when it comes topersonal data.
I am not alone in my reticence.Many people go further than I do: Theyactively resist data capture. They engagein data play to intentionally withholdpersonal information. They leveragethe seams within and between services.They create multiple accounts andengage in data scrubbing. They disallowbrowser cookies [ 2] and flush browsercaches. They avoid importing data fromone social network to another. They areconcerned about location tracking.
Consensus is building thattransparency around personal-datacollection and how the data is usedwould help build trust betweencorporations and users. Consumer-service corporations have a greatopportunity to engage their usersmore effectively as protected andengaged personal-data curators. This
Scrupulous, Scrutable, and
Sumptuous: Personal Data Futures
Elizabeth F. Churchill,eBay Research Labs
COLUMN Ps AND Qs