technical Perspective
one size fits all: an idea Whose
time has come and Gone
by Michael stonebraker
begiNNiNg iN The early to mid-1980s
the relational model of data has dominated the DBMS landscape. Moreover,
descendents of the early relational
prototypes (System R and Ingres) have
become the primary commercial relational DBMSs. As such, the basic
architecture sold by the commercial
vendors is more than two decades
old. In the meantime the computers
have advanced dramatically on which
DBMSs are deployed. Grids (blades)
have replaced shared memory multiprocessors, CPU speeds have greatly
increased, main memory has gotten
much bigger and faster, and disks have
gotten a lot bigger (but have lagged
CPUs in bandwidth increase).
During the same period, several
new major applications of DBMS technology have emerged to complement
the business data processing market
for which RDBMSs were originally
designed. These include data warehouses, semi-structured data, and scientific data.
It now seems apparent that the traditional architecture of RDBMSs can
be beaten significantly (a factor of 25–
50) by a specialized implementation in
every major DBMS market. In the data
warehouse area, this implementation
appears to be a coded column store. A
column store represents data column-
by-column rather than the traditional
row-by-row. In a column architecture
the execution engine must read only
those data elements relevant to the
query at hand, rather than all data
elements. Also, data compression is
much more effective in a column store
because one is compressing only one
type of data on a storage block rather
than several. As a result, less data is
brought from disk to main memory.
Moreover, if the execution engine operates on compressed data, then there
is less copying and better L2 cache utilization. Hence, CPU execution time is
dramatically reduced. These savings
have been realized in the original column stores from the 1990s (MonetDB
and SybaseIQ) as well as by more recent commercial products from Vertica, Infobright, and Paraccel.
The research presented here by
Boncz, Manegold, and Kersten documents these advantages, and is definitely worth reading. It focuses on
column execution and compression
in main memory and complements
other analyses of data warehouse disk
behavior. As such, it is exemplary of a
collection of recent papers on column
store implementation techniques
(in VLDB and SIGMOD) to which the
interested reader can turn for other
analyses.
In other database markets, including business data processing, specialized architectures offer similar
advantages. Papers analyzing early prototypes in these areas are beginning to
appear. In my opinion, we are seeing
“the beginning of the end” of the “
one-size-fits-all” systems sold by the major
DBMS vendors. I expect specialized
architectures to become dominant in
several DBMS application areas over
the next couple of decades for perfor-mance-conscious users. On the other
hand, at the low-end open source systems such as MySQL, Postgres, and
Ingres are gaining traction.
Expect to see a flurry of additional
papers exploring facets of specialized
architectures from the DBMS research
community. Furthermore, there have
been a collection of recent DBMS
startups with specialized implementations, and I expect there will be more
to come.
It should be clear that the DBMS
community is in transition from “the
old” to “the new.” The next decade
should be a period of vibrant activity
in our field.
Michael Stonebraker ( stonebraker@csail.mit.edu) is
an adjunct professor in the Electrical Engineering and
computer science department at mi T, cambridge, ma,
and the chief technology officer of vertica systems, inc.,
and byledge corp.