quirements, projecting that in order to achieve acceptable performance with a 1TB data set we would need a GB/sec sequential read speed from the disks, translating to about 20 servers at the time. Jim was a firm believer in using “bricks,” or the cheapest, simplest building blocks money could buy. We started experimenting with low-level disk IO on our inexpensive Dell servers, and our disks were soon much quieter and performing more efficiently.
Toward the end of 2000 data started arriving from the SDSS telescope in Apache Point, NM (see Figure 1), and Jim said simply, “Let’s get to work.” So during Christmas and the New Year’s holiday we converted the whole object-oriented database schema to a Microsoft SQL Server-based schema. We modified many of our loading scripts by looking at Tom Barclay’s TerraServer code and soon, with Jim’s guidance, had a simple SQL Server version of the SDSS database. 24
The SDSS project was at first reluctant to even consider switching technologies, so for about a year the SQL Server database we had designed was a “cowboy” implementation, not part of the official SDSS data release. Coincidentally, Intel gave us a pool of servers to use to experiment with the database, giving us a show-and-tell meeting in San Francisco a few months after the first bits of data started to come in from the telescope. We decided to create a simple graphical interface on top of the database, similar to the one on the TerraServer, to enable astronomers and anyone else to visually browse the sky. My son, Tamas ( 13 at the time) came along for the Intel meeting and helped man the booth, telling us, “No self-respecting schoolkid would use such an interface,” that it had to be much more visually stimulating and interactive.
Jim gave one of his characteristically big laughs; we then looked at one another and realized we had our target audience. Even if astronomers were not ready, we would design a database and integrated Web site for schoolchildren. This was the moment we set out to build the SkyServer to connect the database to the pixels in the sky. The name was an obvious play on Ter-
raServer, and we pitched it to the SDSS project as a tool for education and outreach, as well as for serious scientific investigation. When the first batch of SDSS data was officially declared public in 2001, the SkyServer, then running on computers donated by Compaq, appeared side by side with the official database for astronomers. We wrote simple scripts to create false-color images from the raw astronomy data and adopted the TerraServer scheme to build an image pyramid consisting of successive sets of tiles at different magnifications.
By the next year (2002), everyone realized that the SkyServer engine was much more robust and scalable than expected. Ani Thakar, a research scientist at Johns Hopkins, made a superhuman effort to convert the whole existing framework to SQL Server. 22 Jim insisted on “two-phase loading,” that is, we would load each new batch of data into its own separate little database, then run data-cleaning code and accept the data only if it passed all the tests. This foresight turned out to be enormously useful; once the data started coming through the hose, we could recover from errors (there were lots of them) much more easily. We soon had the framework and the ability to load hundreds of GB of data in a reasonable amount of time, marking the transition of the SkyServer team from cowboys to “ranchers.”
Curtis Wong, manager of the Microsoft Next Media Research Group, then redesigned the SkyServer’s interface. His seemingly minor modifications of our style sheets had a huge effect on the entire site’s look and feel; it suddenly came alive. Many volunteers, including former Johns Hopkins student Steve Landy and physics teacher Rob Sparks, helped add content. Jordan Raddick, a science writer, created a new section of the Web site, with educational exercises and formal class materials for all students, from kindergarten to high school. Professional astronomers also appreciated the power of the visual tools, and the site quickly became popular, even in this community.
The next major step came with the emergence of Microsoft’s .NET Web services. Jim invited our development team (at Johns Hopkins) to San Francisco to the VSLive Conference (Janu-
References:
Archives