comfort he did in astronomy and how his “ 20 queries” cut through the communication gap in the various communities. The same thing happened when he started to work with oceanographers from the Monterey Bay Aquarium Research Institute ( www.mbari.org/) and the North-East Pacific Time-Series Undersea Networked Experiments project ( www.neptuneproject.org/).

He was among the first computer scientists to realize how the data explosion changes not only science but scientific computing as well. As the amount of data grows faster than our ability to transfer it through the network, the only solution that promises to keep up is to take the computation directly to the data. 20 This principle contrasts with recent trends in high-performance computing where the machines are increasingly CPU-inten-sive, while the ability to read and write data lags behind processing speed. Lively discussions with Jim and Gordon Bell of Microsoft Research about this problem resulted in a paper outlining what is wrong with today’s computing architectures2; I am immensely proud of having been a co-author. Our group at Johns Hopkins is now implementing the vision we outlined there, building a machine—called in Jim’s honor the Gray Wulf ( graywulf.org/)— specially tailored for data-intensive computations.

We realized that the data explosion in astronomy is due to the electronic charge-coupled device detectors that have replaced photographic plates. As semiconductor manufacturing matured, each year has brought a new generation of bigger and more sensitive detectors that could be replaced without affecting the telescopes themselves. Much as gene chips and gene sequencers have industrialized molecular biology, the revolution in Earth-observing satellite imagery has also been the result of better imaging devices. The common theme is that whenever an inexpensive sensing device is on an exponential growth path, a scientific revolution is imminent.

Such a revolution is taking place today with inexpensive wireless sensor networks, sometimes called “smart dust” after the University of California, Berkeley, project that first developed them almost a decade ago.

to Jim, there is
nothing closer to
the data than the
database; thus the
computations have
to be done inside
the database.
9

It is expected that within the next five years there will be more sensors online than computers worldwide. Intel’s Berkeley Lab was among the first to develop such devices. My wife, Kathy Szlávecz, is a soil biologist interested in the soil ecosystem and has for years painstakingly sought and collected data involving environmental parameters. Jim connected her to the Berkeley lab, and after her seminar we came away with a shoebox full of Berkeley Motes ( www.eecs.berke-ley.edu/department/EECSbrochure/ c6-s1.html). At the same time Johns Hopkins hired Andreas Terzis, a computer scientist specializing in wireless sensors, and thus a new collaboration ( lifeunderyourfeet.org/) was formed. Despite having only a shoestring budget, it still managed to build a small sensor network to study soil moisture and temperature.

Jim realized that in this field of en-viro-sensor networks, almost everyone focuses on the first phase of the problem—collecting data. In astronomy we have learned the hard way that with exponential data growth one should worry about data processing and analysis even at the beginning; otherwise, it will be difficult to catch up once the data stream really opens up. 1

He was also very interested in the flexibility of the SkyServer framework. Another aspect of the environmental work is how interested scientists are in long-term trends and averages, even as they want to retain all the raw data and dive in whenever they find something unusual. We again went to work, converting in a matter of weeks the SkyServer framework into an end-to-end system to handle data from environmental science. 23 We wrote code to handle time-series and in-database calibrations. Soon, we had help from Stuart Ozer from Microsoft Research who built an OLAP data cube for the sensor data, the first ever (as far as we know) in a scientific application (see Figure 5). 14

 

collaborator and friend Over the years, as our collaboration intensified, our work days would start with Jim’s phone calls while he walked from home to BARC, followed by back-and-forth calls until early morning on the east coast (of the U.S.). Very often

References:

http://www.mbari.org/

http://www.neptuneproject.org/

http://graywulf.org/

http://lifeunderyourfeet.org/

http://www.eecs.berkeley.edu/department/EECSbrochure/c6-s1.html

http://www.eecs.berkeley.edu/department/EECSbrochure/c6-s1.html

http://www.eecs.berkeley.edu/department/EECSbrochure/c6-s1.html

Archives