Vviewpoints
DOI: 10.1145/1409360.1409373
interview
Database Dialogue
with pat selinger
Relational database pioneer Patricia G. Selinger explores the vast realm of database
technology and trends in a wide-ranging discussion with Microsoft’s James Hamilton.
IBM FELLOW aND retired director
of the Database Technology
Institute at IBM’s Almaden
Research Center Patricia G.
Selinger has made tremendous contributions to the database
industry. After joining IBM Research
in 1975, Selinger became a leading
member of the team that built System R, the first proof that relational
database technology was practical.
Her innovative work on cost-based
query optimization has been adopted
by the majority of relational database
vendors and is taught in most university database courses. Selinger managed the Almaden Computer Science
department and established the Database Technology Institute, a joint
program between IBM Research and
the IBM software development team
that accelerated advanced technology
into data management products such
as DB2 Universal Database for z/OS,
IMS, DB2 UDB on Linux, Windows,
and Unix.
The interview presented here was
conducted by James Hamilton, a
member of Microsoft’s SQL Server
Team and the former lead architect on
DB2 and leader of the C++ compiler
project at IBM.a
James hamiLton Let’s start with the
role of a query optimizer in a relational
database management system and your
a This interview took place just prior to Selinger’s
retirement from IBM in October 2005.
invention of cost-based optimizers.
Pat seLinGeR As you know, the fundamental tenet of a relational database is that data is stored in rows
and columns. It’s value-based in that
the values themselves stand up for—
represent—the data. No information
is contained in pointers. All of the information is reflected in a series of
tables, and the tables have a certain
well-known shape and form: there’s
an orders table, a customers table, an
employees table, and so forth. Each of
those tables has a fixed set of columns:
the first name, the last name, the address.
Relational systems have a higher-level
language called SQL, which is a set-ori-
ented query language. This is a unique
concept and really what distinguishes
relational database systems from anything that came before or after.
The set-oriented concept of the
query language allows asking for all
the programmers who work in department 50; or all of the orders over
$5,000; or all of the San Jose customers who have orders over $5,000; and
so forth. The information in relational
tables can be combined in many different ways, based on their values
only.
How do you take this very high-level
set-oriented question that the user
asks and turn it into an exact recipe
for navigating the disk and getting the
information from each of the different
records within each of the different
tables? This process is query optimization: the idea of mapping from the
higher-level SQL down to the lower-level recipe or series of actions that
you go through to access the data.
Query optimizers have evolved as
an enabling technology to make this
high-level programming language—
this data-access language, SQL—work.
Without that, you would end up using
brute force: let’s look at each row and
see if it matches the description of
what’s asked for. Is it department 50?
Is it an order that’s over $5,000? That
would be very inefficient to scan all of
the data all of the time.
So we have access techniques that
allow you to look at only a subset of the