the developer to use or exclude major
subsystems depending on whether the
application needs them. Once a system
is sufficiently modular to permit a truly
small footprint, we will find that system deployed on an array of hardware
platforms with staggeringly large differences in capabilities. In these cases,
the system must be configurable to its
operating environment: the specific
hardware, operating system, and application using it.
modularity
Some argue that database architecture
is in need of a revolution akin to the
RISC revolution in computer hardware.
The conventional monolithic DBMS architecture is not facile enough to adapt
to today’s data demands, so we must
build data management capabilities
out of a collection of small, simple,
reusable components. For example,
instead of viewing SQL as a simple binary decision, Chaudhuri and Weikum
argue that query capabilities should be
provided at different levels of sophistication: a single-table selection processor that has a B+ tree index that supports simple indexing, updating, and
selection. To this, you might add transactions. Continuing up the complexity hierarchy, consider a select-project-join processor. Next, add aggregates. In
this manner, you transform SQL from
a monolithic language into a family
of successively richer languages, each
of which is provided as a component
and satisfies a significant number of
application domains. Any particular
application selects the components it
needs. This idea of a component-based
architecture can be extended to include several other aspects of database
design: concurrency control, transactions, logging, and high availability.
Concurrency control lends itself to
a hierarchy similar to that presented in
the language example. Some applications are completely single-threaded
and require no locking; others have low
levels of concurrency and would be well
served by table-level locks or API-level
locks (allowing only one writer or multiple readers into the database system
simultaneously); finally, highly concurrent applications need fine-grain
locking and multiple degrees of isolation (potentially allowing applications
to see values that have been written by
old-style database
systems solve
old-style problems;
we need new-style
databases to solve
new-style problems.
incomplete transactions). 6 In a conventional database management system,
locking is assumed; in the brave new
world discussed here, locking is optional and different components can
be used to provide different levels of
concurrency.
Transactions provide the illusion
that a collection of operations are applied to a database in an atomic unit
and that once applied, the operations
will persist, even in the face of application or system failure. Transaction
management is at the heart of most database management systems, yet many
applications do not require transactions. In a component-based world,
transactions, too, are optional. When
they are present, a system might still
have a number of different components
providing basic transactional mechanisms, savepoints (the ability to identify a point in time to which the database
may be rolled back), two-phase commit
to support transactions that span multiple databases, nested transactions
to decompose a large operation into a
number of smaller ones, and compensating transactions to undo high-level,
logical operations.
Many transaction systems use some
form of logging to provide rollback and
recovery capabilities. In that context,
it hardly seems necessary to treat logging as a separable component, but it
should be. A transactional component
might be designed to work with multiple implementations, some of which
do not use logging (for example, no-overwrite schemes such as shadow-pag-es). Perhaps even more interesting, a
logging system might be useful outside
the context of transactions; it might be
used for auditing or provide some sort
of backup mechanism. In either case,
it should be an application designer’s
decision whether logging is necessary
rather than having it imposed by the
database vendor.
Finally, data is sometimes so critical
that downtime is unacceptable. Many
database systems provide replicated
or highly available systems to address
this need. Although this functionality is
often available as an add-on in today’s
systems, they have not gone far enough.
A developer may wish to use a database’s HA (high-availability) configuration, but may use it in conjunction with
some other company’s HA substrate. If