attracted the broadest interest within
the group.
In addition to the listed topics, the
main issues raised during the meeting
included management of uncertain
information, data privacy and security,
e-science and other scholarly applications, human-centric interaction
with data, social networks and Web
2.0, personalization and contextual-ization of query- and search-related
tasks, streaming and networked data,
self-tuning and adaptive systems, and
the challenges raised by new hardware
technologies and energy constraints.
Most are captured in the following
discussion, with many cutting across
multiple topics.
Revisiting database engines. System R
and Ingres pioneered the architecture
and algorithms of relational databases;
current commercial databases are still
based on their designs. But many of the
changes in applications and technology demand a reformation of the entire
system stack for data management.
Current big-market relational database
systems have well-known limitations.
While they provide a range of features,
they have only narrow regimes in which
they provide peak performance; online
transaction processing (OLTP) systems
are tuned for lots of small, concurrent
transactional debit/credit workloads,
while decision-support systems are
tuned for a few read-mostly, large-join-and-aggregation workloads. Meanwhile, for many popular data-intensive
tasks developed over the past decade,
relational databases provide poor
price/performance and have been
rejected; critical scenarios include
text indexing, serving Web pages, and
media delivery. New workloads are
emerging in the sciences, Web 2.0-style
applications, and other environments
where database-engine technology
could prove useful but is not bundled
in current database systems.
Even within traditional application domains, the database marketplace today suggests there is room for
significant innovation. For example, in
the analytics markets for business and
science, customers can buy petabytes
of storage and thousands of processors, but the dominant commercial
database systems typically cannot
scale that far for many workloads. Even
when they can, the cost of software and
the ubiquity
of big data is
expanding the
base of users
and developers of
data-management
technologies and
will undoubtedly
shake up
the database
research field.
management relative to hardware is
exorbitant. In the OLTP market, business imperatives like regulatory compliance and rapid response to changing
business conditions raise the need to
address data life-cycle issues (such as
data provenance, schema evolution,
and versioning).
Given these requirements, the
commercial database market is wide
open to new ideas and systems, as
reflected in the recent funding climate
for entrepreneurs. It is difficult to
recall when there were so many startup companies developing database
engines, and the challenging economy
has not trimmed the field much. The
market will undoubtedly consolidate
over time, but things are changing fast,
and it remains a good time to try radical ideas.
Some research projects have begun
taking revolutionary steps in database
system architecture. There are two
distinct directions: broadening the
useful range of applicability for multipurpose database systems (for example, to incorporate streams, text search,
XML, and information integration)
and radically improving performance
by designing special-purpose database
systems for specific domains (for example, read-mostly analytics, streams,
and XML). Both directions have merit,
and the overlap in their stated targets
suggests they may be more synergistic
than not. Special-purpose techniques
(such as new storage and compression formats) may be reusable in more
general-purpose systems, and general-purpose architectural components
(such as extensible query optimizer
frameworks) may help speed prototyping of new special-purpose systems.
Important research topics in the
core database engine area include:
•Designing systems for clusters
of many-core processors that exhibit
limited and nonuniform access to off-chip memory;
• Exploiting remote RAM and Flash
as persistent media, rather than relying solely on magnetic disk;
• Treating query optimization and
physical data layout as a unified, adaptive, self-tuning task to be carried out
continuously;
• Compressing and encrypting data
at the storage layer, integrated with
data layout and query optimization;