ary 2002) where .NET was introduced and where our students entered a worldwide .NET programming contest, eventually coming in second. They created a set of services—called SkyQue- ry3—that performed queries across geographically separate databases. At the same time, Jim built a prototype for the ImageCutout, a Web service building dynamic image mosaics, that became the core of the next-genera-tion user interfaces we developed for the SkyServer to integrate images and database content. 19

Later, during a six-month sabbatical, Jim picked up a few astronomy textbooks, took them along on his sailboat, Tenacious, and while sailing quickly turned into a “native astronomer,” understanding the important concepts of astronomy. He thus enabled himself to participate in the reformulation of research ideas into elegant SQL, working with us side-by-side not only on database-related problems but on major-league astronomy research. We subsequently wrote many papers together where his ideas were quite relevant to astronomy. 18 At the same time he taught us database design and computer science and invited several of our students to be interns at BARC.

As Jim spent more and more of his time in astronomy, he noted on one of his famous PowerPoint slides concerning relational database design: “I love working with astronomers, since their data is worthless.” He meant it in the most complimentary sense, that the data could be freely distributed and shared, since there were no financial implications or legal constraints. He went on to participate in many SDSS meetings, becoming a much-beloved and highly respected member of the astronomical community. His contributions are indeed very much appreciated, and in recognition of his work an asteroid is about to be named for

him by the International Astronomical Union.

Soon after the SkyServer was launched in 2001, it was obvious that astronomers would want to perform a variety of spatial searches for objects in the sky. The survey also had a rather complex geometry, and in order to describe it we would need an extensive framework for spatial operations. Over the next few years (2002–2006), with several of my students and postdocs (particularly Peter Kunszt) we wrote, again with Jim’s guidance, a fast package for spatial searches called Hierarchical Triangular Mesh (see Figure 2). 17 We also built an interface to SQL Server and were soon performing blazingly fast searches over the sky. This emerged as one of the most notable features of the SkyServer. The tools eventually also made it into the shrink-wrap package of SQL Server 2005 as a demo on how to interface SQL to external software. 4, 8

Jim was excited about these spatial computations, since they demonstrated one of his main convictions: that when you have lots of data, you take the computations to the data rather than the data to the computations. To Jim, there is nothing closer to the data than the database; thus the computations have to be done inside the database. 9

As spatial searches grew in complexity, it became apparent that we would need even more extensive processing capabilities. Besides indexed searches, we needed better ways to represent complex polygons on the sphere. We ended up combining two complementary approaches. In one representation, polygons were represented as intersections of the unit sphere with a 3D polyhedral. The polyhedral was delimited by planes, so each convex polyhedron could be built from the intersection of a set of these half-spaces. This turned out to be handy for testing a point against a polygon in SQL. The inside

test focused on the dot-product of the Cartesian vector describing the point with the normal vector of the half-space against the distance of the plane from the origin. 9 The dual representation (in terms of arcs) formed the outlines of the polygons (see Figure 3). We built tools to perform the set algebra of spherical polygons, including mor-phological operations over the sphere, a complex computational geometry library, all in SQL. I can think of no other person who would have thought of such an idea, much less been able to implement it. The library was subsequently converted to C# by Tamas Budavari and George Fekete of John Hopkins, though much of the code in SkyServer remains Jim’s original.

Jim realized there are two different types of spatial problems: one related to a localized, relatively small region, the other to a fuzzy (probabilistic) spatial join over much of the celestial sphere. He came up with the idea of using latitude zones and wrote the whole query, joining two tables with hundreds of millions of rows, as a single SQL statement, letting the optimizer do its magic in terms of parallelizing

 

figure 3: typical spherical polygon (orange) describing an area of uniform target selection for follow-up spectroscopic observation in the sloan Digital sky survey arising from the intersection of geometries in the survey area.

References:

Archives