tion, interaction, and movement—
including all types of other macro-level
descriptions of society and people and
the world—are now available to us.
As more data is collected from a growing pool of devices, has the individual
lost the right to information privacy?
MICHAEL STONEBRAKER: Imagine this
simple example: you show up at your
doctor’s office and have an x-ray done
and you want the doctor to run a query that shows who else has x-rays that
look like yours, what was their diagnosis and what was the morbidity of the
patients. That requires integrating essentially the country’s entire online
medical databases and presumably
would extend to multiple countries
as well. While that is a daunting data
integration challenge, because every
hospital chain stores its data with different formats, different encodings
for common terms, etc., the social value gained from solving it is just huge.
But that also creates an incredibly difficult privacy problem, one that is not
a technical issue. Because if you’re
looking for an interesting medical
query, you’re not looking for common events; you’re looking for rare
events, and at least to my knowledge,
there aren’t any technical solutions
that will allow access to rare events
without indirectly disclosing who the
events belong to.
I view the privacy problem to be
basically a legal problem. We have
to have legal remedies in this area.
SINCE ITS INAUGURATION in 1966, the ACM A.M. Turing Award has recognized ma- jor contributions of lasting importance to computing.
Through the years, it has become the
most prestigious award in computing.
To help celebrate 50 years of the ACM
Turing Award and the visionaries who
have received it, ACM has launched
a campaign called “Panels in Print,”
which takes the form of a collection of
responses from Turing laureates, ACM
award recipients and other ACM experts on a given topic or trend.
For our fourth and final Panel in
Print, we invited 2014 ACM A.M. Turing Award recipient MICHAEL STONEBRAKER, 2013 ACM Prize recipient DAVID
BLEI, 2007 ACM Prize recipient DAPHNE
KOLLER, and ACM Fellow VIPIN KUMAR to
discuss trends in big data.
Gartner estimates that there are currently about 4. 9 billion connected devices (cars, homes, appliances, industrial
equipment, among others) generating
data. This is expected to reach 25 billion
by 2020. What do you see as some of the
primary challenges and opportunities
this wave of data will create?
VIPIN KUMAR: One of the major challenges we are going to see is that the
data being gathered from these connected devices and sensors is very different from other datasets that our big
data community has had to deal with.
The biggest successes we have seen
for big data are in applications such as
Internet search, e-commerce, place-
ment of online ads, language transla-
tion, image processing, autonomous
driving. These successes have been
enabled, to a great extent, by the avail-
ability of large, relatively structured da-
tasets that can be used to train a broad
range of machine learning algorithms.
But the data from multitudes of in-
terconnected devices in its raw state,
can be highly fragmented, disparate
in space and time, and very heteroge-
neous. Analyzing such data will be a big
and new technological challenge for
the machine learning and data mining
communities.
DAVID BLEI: The key idea here is that
just the data from something as simple as Netflix watching habits doesn’t
provide the recommendation of a new
movie; it’s that data alongside all the
data from everybody else that helps
make recommendations.
It’s an exciting world because we
are personalizing our interaction with
devices through the aggregate data
of everybody using their devices. Of
course, this all comes with a challenge
around privacy and what we give up
when we make our data available or
the spectrum of how much we can give
up against how much personalization
power we get in return.
The other opportunity is in an unprecedented way to learn about the
world through these huge collections
of many individuals. This is a massive
dataset, and patterns of communica-
Big Data
DOI: 10.1145/3079064
David Blei Daphne Koller Michael Stonebraker Vipin Kumar