Laboratory at the University of Wiscon-sin-Madison. “It was a complete home
run.” The paper led to her election to
the National Academy of Engineering in
1999, won her a slew of awards (
including the SIGMOD Edgar F. Codd Innovations Award in 2002), and remains the
seminal work on query optimization in
relational systems.
Propelled by Selinger’s new ideas,
System R, Ingres, and their commercial
progeny proved that relational systems
could provide excellent performance.
IBM’s DB2 edged out IMS and IDMS on
mainframes, while Ingres and its derivatives had the rapidly growing DEC Vax
market to themselves. Soon, the database wars were largely over.
1980s in the form of object-oriented databases (OODBs), but they never caught
on. There weren’t that many applications for which an OODB was the best
choice, and it turned out to be easier
to add the key features of OODBs to
the relational model than to start from
scratch with a new paradigm.
More recently, some have suggested
that the MapReduce software framework, patented by Google this year, will
supplant the relational model for very
large distributed data sets. [See “More
Debate, Please!” by Moshe Y. Vardi on p.
5 of the January 2010 issue of
Communications.] Clearly, each approach has its
advantages, and the jury is still out.
As RDBMSs continues to evolve,
scientists are exploring new roads of
inquiry. Fagin’s key research right now
is the integration of heterogeneous
data. “A special case that is still really
hard is schema mapping—converting
data from one format to another,” he
says. “It sounds straightforward, but
it’s very subtle.” DeWitt is interested
in how researchers will approach the
“unsolved problem” of querying geo-
graphically distributed databases,
especially when the databases are cre-
ated by different organizations and
are almost but not quite alike. And
Ramakrishnan of Yahoo! is investigat-
ing how to maintain databases in the
cloud, where service vendors could
host the databases of many clients.
“So ‘scale’ now becomes not just data
volume, it’s the number of clients, the
variety of applications, the number of
locations and so on,” he says. “Man-
ageability is one of the key challenges
in this space.”
faster Queries
During the 1980s, De Witt found another way to speed up queries against relational databases. His Gamma Database
Machine Project showed it was possible
to achieve nearly linear speed-ups by
using the multiple CPUs and disks in a
cluster of commodity minicomputers.
His pioneering ideas about data partitioning and parallel query execution
found their way into nearly all commercial parallel database systems.
“If the database community had not
switched from CODASYL to relational,
the whole parallel database industry
would not have been possible,” De Witt
says. The declarative, not imperative,
programming model of SQL greatly facilitated his work, he says.
The simplicity of the relational
model held obvious advantages for us-
ers, but it had a more subtle benefit as
well, IBM’s Fagin says. “For theorists
like me, it was much easier to develop
theory for it. And we could find ways to
make the model perform better, and
ways to build functions into the model.
The relational model made collabora-
tion between theorists and practitio-
ners much easier.”
Indeed, theorists and practitioners
worked together to a remarkable de-
gree, and operational techniques and
applications flowed from their work.
Their collaboration resulted in, for ex-
ample, the concept of locking, by which
simultaneous users were prevented
from updating a record simultaneously.
The hegemony of the relational
model has not gone without challenge.
For example, a rival appeared in the late
Further Reading
Codd, E.F.
A relational model of data for large shared
data banks. Comm. of the ACM 13, 6, June 1970.
Selinger, P.G., Astrahan, M.M., Chamberlin, D.D.,
Lorie, R.A., Price, T.G.
Access path selection in a relational
database management system. Proceedings
of the 1979 ACM SIGMOD International
Conference on Management of Data, 1979.
Ullman, J. D.
Principles of Database and Knowledge-Base
Systems: Volume II: The new Technologies,
W.h. Freeman & Co., new York, n Y, 1990.
Hellerstein, J. M. and Stonebraker, M.
Readings in Database Systems (4th ed.), MIT
Press, Cambridge, MA, 2005.
Agrawal, R., et al
The Claremont Report on Database
Research, University of California at
Berkeley, Sept. 2008. http://db.cs.berkeley.
edu/claremont/claremontreport08.pdf
Gary Anthes is a technology writer and editor based in
arlington, va.
© 2010 acm 0001-0782/10/0500 $10.00
Looking Ahead With
Michael Stonebraker
an adjunct professor at Massachusetts institute of Technology, Michael stonebraker
is renowned as a database architect and a pioneer in several database technologies,
such as ingres, PostgresQL, and Mariposa (which he has commercial interests in). as
for the database industry’s future direction, stonebraker says one-third of the market
will consist of relational database management systems used in large data warehouses,
such as corporate repositories of sales information. But the mainstream products in
use today, which store table data row by row, will face competition from new, better-
performing software that stores it column by column. “You can go wildly faster by
rotating your thinking 90 degrees,” he says.
another third of the market he believes will be in online transaction processing,
where databases tend to be smaller and transactions simpler. That means databases
can be kept in memory and locking can be avoided by processing transactions one at a
time. These “lightweight, main memory” systems, stonebraker says, can run 50 times
faster than most online transaction processing systems in use today.
in the final third of the market, there are “a bunch of ideas,” depending on the type
of application, he says. one is streaming, where large streams of data flow through
queries without going to disk. another type of nonrelational technology will store
semistructured data, such as XML and rdF. and a third approach, based on arrays
rather than tables, will be best for doing data clustering and complex analyses of very
large data sets.
Finally, “if you don’t care about performance,” says stonebraker, “there are a bunch
of mature, open-source, one-size-fits-all dBMss.”