The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLoG@cacM
community. in each issue of Communications, we’ll publish
selected posts or excerpts.
follow us on Twitter at http://twitter.com/blogcacM
Michael Stonebraker issues a call to arms about research
groups’ data-management problems. Jason Hong discusses
the nature of functionality with respect to design.
Michael stonebraker “Big Data, Big Problems” http://cacm.acm.org/ blogs/blog-cacm/103932 January 14, 2011 I was at a conference re- cently and talked with a science profes- sor at another university. He made the following startling statement.
He has close to 1 petabyte (PB) of
data that he uses in his research. In
addition, he surveyed other scientific research groups at his university
and found 19 other groups, each with
more than 100 terabytes (TBs) of data.
In other words, 20 research groups at
his university have datasets between
100TB–1PB in size.
I immediately said, “Why not ask
your university’s IT services to stand
up a 20-petabyte cluster?”
His reply: “Nobody thinks they are
ready to do this. This is research com-
puting, very different from regular IT.
The trade-offs for research computing
are quite different from corporate IT.”
His answer: “EC2 storage is too ex-
pensive for my research budget; you
essentially have to buy your storage ev-
ery month. Besides, how would I move
1PB to Amazon? Sneaker net [disks
sent to Amazon via U.S. mail] is not
As a result, he is in the process of
starting a 20-research group federa-
tion that will stand up the required
server. In other words, this consortium
will run its own massive data server.
I then asked, “Why not put your
data up on EC2?” (EC2 is Amazon’s
Elastic Compute Cloud service.)
disclosure: michael stonebraker is associated with
four startups that are either producers or consumers
of database technology. hence, his opinions should be
considered in this light.