Doi: 10.1145/1400214.1400231
How he helped develop the SkyServer, delivering computation directly to terabytes of astronomical data.
JiM grAY WorKEd with astronomers for more than a decade, right up to the time he went missing in 2007. My collaboration with him created some of the world’s largest astronomy databases and enabled us to test many unorthodox data-management ideas in practice. The astronomers collaborating with us have continued to be very receptive to them, embracing Jim as a card-carrying member of their community. Jim’s contributions have left a permanent mark on astronomy worldwide, as well as on e-science in general.
Astronomy data has doubled in size every year
for the past 20 years, due mostly to the emergence
of electronic sensors. The largest sky survey of the
past decade, the Sloan Digital Sky Survey, or SDSS
photoGraph by alexander szalay
( www.sdss.org), is often called the cosmic genome project. When it began in 1992, the size of the data set to be used for scientific analysis was measured in terabytes, shockingly large for the time. My group at Johns Hopkins University was selected by the SDSS Collaboration to build the science archive for the
SDSS, a task we quickly realized would require a powerful search engine with spatial search capabilities. Our experimental system, based on object-oriented technologies, was good enough to develop an understanding of how the eventual system should function, though we knew we would also need to do something different, most notably in terms of query performance.
One SDSS collaboration meeting in the mid-1990s took me to Seattle where I had dinner with Charles Simo-nyi, then at Microsoft, who recognized the similarities between our problem and the Microsoft TerraServer (www. terraserver.com), which provides free online access to U.S. Geological Survey digital aerial photographs, and immediately called Jim to arrange a meeting. A few weeks later I flew to San Francisco and visited him at the Bay Area Research Center. Thus began a lively discussion about the TerraServer, how it could be turned inside out for a new (astronomical) purpose, and how spatial searches over the Earth were both similar to and different from spatial searches over the sky. We spent a full day dissecting the problem.
Jim asked about our “ 20 queries,” his incisive way of learning about an application, as a deceptively simple way to jump-start a dialogue between him (a database expert) and me (an astronomer or any scientist). Jim said, “Give me your 20 most important questions you would like to ask of your data system and I will design the system for you.” It was amazing to watch how well this simple heuristic approach, combined with Jim’s imagination, worked to produce quick results.
Jim then came to Baltimore to look over our computer room and within 30 seconds declared, with a grin, we had the wrong database layout. My colleagues and I were stunned. Jim explained later that he listened to the sounds the machines were making as they operated; the disks rattled too much, telling him there was too much random disk access. We began mapping SDSS database hardware re-
References:
Archives