comfort he did in astronomy and how
his “ 20 queries” cut through the communication gap in the various communities. The same thing happened when
he started to work with oceanographers
from the Monterey Bay Aquarium Research Institute ( www.mbari.org/) and
the North-East Pacific Time-Series Undersea Networked Experiments project
( www.neptuneproject.org/).
He was among the first computer
scientists to realize how the data explosion changes not only science but
scientific computing as well. As the
amount of data grows faster than our
ability to transfer it through the network, the only solution that promises
to keep up is to take the computation
directly to the data. 20 This principle
contrasts with recent trends in high-performance computing where the
machines are increasingly CPU-inten-sive, while the ability to read and write
data lags behind processing speed.
Lively discussions with Jim and Gordon Bell of Microsoft Research about
this problem resulted in a paper outlining what is wrong with today’s computing architectures2; I am immensely
proud of having been a co-author. Our
group at Johns Hopkins is now implementing the vision we outlined there,
building a machine—called in Jim’s
honor the Gray Wulf ( graywulf.org/)—
specially tailored for data-intensive
computations.
We realized that the data explosion
in astronomy is due to the electronic
charge-coupled device detectors that
have replaced photographic plates.
As semiconductor manufacturing
matured, each year has brought a new
generation of bigger and more sensitive detectors that could be replaced
without affecting the telescopes
themselves. Much as gene chips and
gene sequencers have industrialized
molecular biology, the revolution in
Earth-observing satellite imagery has
also been the result of better imaging
devices. The common theme is that
whenever an inexpensive sensing device is on an exponential growth path,
a scientific revolution is imminent.
Such a revolution is taking place
today with inexpensive wireless sensor networks, sometimes called
“smart dust” after the University of
California, Berkeley, project that first
developed them almost a decade ago.
to Jim, there is
nothing closer to
the data than the
database; thus the
computations have
to be done inside
the database. 9
It is expected that within the next
five years there will be more sensors
online than computers worldwide.
Intel’s Berkeley Lab was among the
first to develop such devices. My wife,
Kathy Szlávecz, is a soil biologist interested in the soil ecosystem and has
for years painstakingly sought and
collected data involving environmental parameters. Jim connected her to
the Berkeley lab, and after her seminar
we came away with a shoebox full of
Berkeley Motes ( www.eecs.berke-ley.edu/department/EECSbrochure/
c6-s1.html). At the same time Johns
Hopkins hired Andreas Terzis, a computer scientist specializing in wireless
sensors, and thus a new collaboration
( lifeunderyourfeet.org/) was formed.
Despite having only a shoestring budget, it still managed to build a small
sensor network to study soil moisture
and temperature.
Jim realized that in this field of en-viro-sensor networks, almost everyone
focuses on the first phase of the problem—collecting data. In astronomy we
have learned the hard way that with
exponential data growth one should
worry about data processing and analysis even at the beginning; otherwise,
it will be difficult to catch up once the
data stream really opens up. 1
He was also very interested in the
flexibility of the SkyServer framework.
Another aspect of the environmental work is how interested scientists
are in long-term trends and averages,
even as they want to retain all the raw
data and dive in whenever they find
something unusual. We again went to
work, converting in a matter of weeks
the SkyServer framework into an end-to-end system to handle data from environmental science. 23 We wrote code
to handle time-series and in-database
calibrations. Soon, we had help from
Stuart Ozer from Microsoft Research
who built an OLAP data cube for the
sensor data, the first ever (as far as we
know) in a scientific application (see
Figure 5). 14
collaborator and friend
Over the years, as our collaboration
intensified, our work days would start
with Jim’s phone calls while he walked
from home to BARC, followed by back-and-forth calls until early morning on
the east coast (of the U.S.). Very often