Today’s data deluge is leading to new approaches
to visualize, analyze, and catalog enormous datasets.
Image courtesy of Lsst corporatIon
The aMouNT oF data available to scientists of nearly every discipline has almost be- come a “Can you top this?” exercise in numbers.
The Sloan Digital Sky Survey (SDSS),
for example, is often cited as a prime
example. Since the survey’s 2.5-meter
telescope first went online in 1998,
more than 2,000 refereed publications
have been produced, but they use just
10% of the survey’s available imaging data, according to a recent U.S.
National Science Foundation workshop on data-enabled science in the
mathematical and physical sciences.
Once the next-generation, state-of-the-art Large Synoptic Survey Telescope
(LSST) goes online in 2016, however, it
is estimated to be capable of producing a SDSS-equivalent dataset every
night for the next 10 years. Another of-ten-cited example is the Large Hadron
Collider. It will generate two SDSS’s
worth of data each day.
On the surface, then, the scientific
community’s mandate seems clear:
create better computational tools to
visualize, analyze, and catalog these
enormous datasets. And to some extent, there is wide agreement these
tasks must be pursued.
the Large synoptic survey telescope will have the ability to survey the entire sky in only
Some leading computational research scientists believe, however, that
progress in utilizing the vast expansion
of data will best be attacked on a project-by-project basis rather than by a pan-disciplinary computational blueprint.
“In theory, you might think we
should all be working together, and
the reality might be that each of the
people working on their own discipline
are achieving the results they need to
scientifically,” says Dan Masys, M.D.,
chairman of biomedical informatics at
Vanderbilt University. “There’s a cost
of communication that reaches an ir-
reducible minimum when you work