Relational database pioneer Patricia G. Selinger explores the vast realm of database technology and trends in a wide-ranging discussion with Microsoft’s James Hamilton.
IBM FELLOW aND retired director of the Database Technology Institute at IBM’s Almaden Research Center Patricia G. Selinger has made tremendous contributions to the database industry. After joining IBM Research in 1975, Selinger became a leading member of the team that built System R, the first proof that relational database technology was practical. Her innovative work on cost-based query optimization has been adopted by the majority of relational database vendors and is taught in most university database courses. Selinger managed the Almaden Computer Science department and established the Database Technology Institute, a joint program between IBM Research and the IBM software development team that accelerated advanced technology into data management products such as DB2 Universal Database for z/OS, IMS, DB2 UDB on Linux, Windows, and Unix.
The interview presented here was conducted by James Hamilton, a member of Microsoft’s SQL Server Team and the former lead architect on DB2 and leader of the C++ compiler project at IBM.a
James hamiLton Let’s start with the role of a query optimizer in a relational database management system and your
a This interview took place just prior to Selinger’s retirement from IBM in October 2005.
invention of cost-based optimizers.
Pat seLinGeR As you know, the fundamental tenet of a relational database is that data is stored in rows and columns. It’s value-based in that the values themselves stand up for— represent—the data. No information is contained in pointers. All of the information is reflected in a series of tables, and the tables have a certain well-known shape and form: there’s an orders table, a customers table, an employees table, and so forth. Each of those tables has a fixed set of columns: the first name, the last name, the address.
Relational systems have a higher-level language called SQL, which is a set-ori-
ented query language. This is a unique concept and really what distinguishes relational database systems from anything that came before or after.
The set-oriented concept of the query language allows asking for all the programmers who work in department 50; or all of the orders over $5,000; or all of the San Jose customers who have orders over $5,000; and so forth. The information in relational tables can be combined in many different ways, based on their values only.
How do you take this very high-level set-oriented question that the user asks and turn it into an exact recipe for navigating the disk and getting the information from each of the different records within each of the different tables? This process is query optimization: the idea of mapping from the higher-level SQL down to the lower-level recipe or series of actions that you go through to access the data.
Query optimizers have evolved as an enabling technology to make this high-level programming language— this data-access language, SQL—work. Without that, you would end up using brute force: let’s look at each row and see if it matches the description of what’s asked for. Is it department 50? Is it an order that’s over $5,000? That would be very inefficient to scan all of the data all of the time.
So we have access techniques that allow you to look at only a subset of the
References:
Archives