• Designing systems that embrace
nonrelational data models, rather than
shoehorning them into tables;
• Trading off consistency and availability for better performance and
thousands of machines; and
•Designing power-aware DBMSs
that limit energy costs without sacrificing scalability.
This list is not exhaustive. One
industrial participant at the Claremont
meeting noted that this is a time of
opportunity for academic researchers; the landscape has shifted enough
that access to industrial legacy code
provides little advantage, and large-scale clustered hardware is rentable in
the cloud at low cost. Moreover, industrial players and investors are aggressively looking for bold new ideas. This
opportunity for academics to lead in
system design is a major change in the
research environment.
Declarative programming for emerging platforms. Programmer productivity
is a key long-acknowledged challenge
in computing, with its most notable
mention in the database context in Jim
Gray’s 1998 Turing lecture. Today, the
urgency of the challenge is increasing
exponentially as programmers target
ever more complex environments,
including many-core chips, distributed services, and cloud computing
platforms.
Nonexpert programmers must be
able to write robust code that scales out
across processors in both loosely and
tightly coupled architectures. Although
developing new programming paradigms is not a database problem per se,
ideas of data independence, declarative programming, and cost-based optimization provide a promising angle of
attack. There is significant evidence
that data-centric approaches will have
significant influence on programming
in the near term.
The recent popularity of the MapReduce programming framework for
manipulating big data sets is an
example of this potential. MapReduce
is attractively simple, building on
language and data-parallelism techniques that have been known for
decades. For database researchers,
the significance of MapReduce is in
demonstrating the benefits of data-parallel programming to new classes
of developers.
this is a unique
opportunity for
a fundamental
“reformation”
of the notion of
data management,
not as a single
system but as
a set of services
that can be
embedded, as
needed, in many
computing contexts.
This opens opportunities for the
database community to extend its
contribution to the broader community, developing more powerful and
efficient languages and runtime mechanisms that help these developers
address more complex problems.
As another example of declarative
programming, in the past five years a
variety of new declarative languages,
often grounded in Datalog, have been
developed for domain-specific systems
in fields as diverse as networking and
distributed systems, computer games,
machine learning and robotics, compilers, security protocols, and information
extraction. In many of these scenarios,
the use of a declarative language has
reduced code size by orders of magnitude while also enabling distributed
or parallel execution. Surprisingly, the
groups behind these efforts have coordinated very little with one another; the
move to revive declarative languages
in these new contexts has grown up
organically.
A third example arises in enter-prise-application programming.
Recent language extensions (such
as Ruby on Rails and LINQ) encourage query-like logic in programmer
design patterns. But these packages
have yet to address the challenge of
enterprise-style programming across
multiple machines; the closest effort
here is DryadLINQ, focusing on parallel analytics rather than on distributed
application development. For enterprise applications, a key distributed
design decision is the partitioning of
logic and data across multiple “tiers,”
including Web clients, Web servers,
application servers, and a backend
DBMS. Data independence is particularly valuable here, allowing programs
to be specified without making a priori
permanent decisions about physical
deployment across tiers. Automatic
optimization processes could make
these decisions and move data and
code as needed to achieve efficiency
and correctness. XQuery has been
proposed as an existing language that
would facilitate this kind of declarative
programming, in part because XML is
often used in cross-tier protocols.
It is unusual to see this much
energy surrounding new data-centric
programming techniques, but the
opportunity brings challenges as