Prior to the meeting, a team led by
one of the participants performed a
bit of ad hoc data analysis over database conference bibliographies from
the DBLP repository (dblp.uni-trier.
de). While the effort was not scientific, the results indicated that the
database research community has
doubled in size over the past decade,
as suggested by several metrics:
number of published papers, number
of distinct authors, number of distinct
institutions to which these authors
belong, and number of session topics
at conferences, loosely defined. This
served as a backdrop to the discussion that followed. An open question is
whether this phenomenon is emerging
at larger scales—in computer science
and in science in general. If so, it may
be useful to discuss the management
of growth at those larger scales.
The growth of the database community puts pressure on the content
and processes of database research
publications. In terms of content, the
increasingly technical scope of the
community makes it difficult for individual researchers to keep track of the
field. As a result, survey articles and
tutorials are increasingly important to
the community. These efforts should
be encouraged informally within the
community, as well as via professional
incentive structures (such as academic
tenure and promotion in industrial
labs). In terms of processes, the reviewing load for papers is increasingly
burdensome, and there was a perception at the Claremont meeting that the
quality of reviews had been decreasing.
It was suggested at the meeting that the
lack of face-to-face program-commit-tee meetings in recent years has exacerbated the problem of poor reviews
and removed opportunities for risky or
speculative papers to be championed
effectively over well-executed but more
pedestrian work.
There was some discussion at the
meeting about recent efforts—
notably by ACM-SIGMOD and VLDB—
to enhance the professionalism of
papers and the reviewing process via
such mechanisms as double-blind
reviewing and techniques to encourage experimental repeatability. Many
participants were skeptical that the
efforts to date have contributed to long-term research quality, as measured in
intellectual and practical relevance. At
the same time, it was acknowledged
that the database community’s growth
increases the need for clear and clearly
enforced processes for scientific publication. The challenge going forward
is to find policies that simultaneously reward big ideas and risk-taking
while providing clear and fair rules for
achieving these rewards. The publication venues would do well to focus as
much energy on processes to encourage relevance and innovation as they
do on processes to encourage rigor
and discipline.
In addition to tuning the mainstream publication venues, there is an
opportunity to take advantage of other
channels of communication. For example, the database research community
has had little presence in the relatively
active market for technical books.
Given the growing population of developers working with big data sets, there
is a need for accessible books on scalable data-management algorithms
and techniques that programmers can
use to build software. The current crop
of college textbooks is not targeted at
this market. There is also an opportunity to present database research
contributions as big ideas in their own
right, targeted at intellectually curious
readers outside the specialty. In addition to books, electronic media (such
as blogs and wikis) can complement
technical papers by opening up different stages of the research life cycle to
discussion, including status reports
on ongoing projects, concise presentation of big ideas, vision statements,
and speculation. Online fora can also
spur debate and discussion if appropriately provocative. Electronic media
underscore the modern reality that
it is easy to be widely published but
much more difficult to be widely read.
This point should be reflected in the
mainstream publication context, as
well as by authors and reviewers. In the
end, the consumers of an idea define
its value.
Given the growth in the database
research community, the time is ripe
for ambitious projects to stimulate
collaboration and cross-fertilization
of ideas. One proposal is to foster
more data-driven research by building
a globally shared collection of structured data, accepting contributions
from all parties. Unlike previous efforts
in this vein, the collection should not
be designed for any particular benchmark; in fact, it is likely that most of the
interesting problems suggested by this
data are as yet unidentified.
There was also discussion at the
meeting of the role of open source
software development in the database
community. Despite a tradition of open
source software, academic database
researchers have only rarely reused
or shared software. Given the current
climate, it might be useful to move more
aggressively toward sharing software
and collaborating on software projects
across institutions. Information integration was mentioned as an area in
which such an effort is emerging.
Finally, interest was expressed
in technical competitions akin to
the Netflix Prize ( www.netflixprize.
com) and KDD Cup ( www.sigkdd.org/
kddcup/ index.php) competitions.
To kick off this effort in the database
domain, meeting participants identified two promising areas for competitions: system components for cloud
computing (likely measured in terms
of efficiency) and large-scale information extraction (likely measured
in terms of accuracy and efficiency).
While it was noted that each of these
proposals requires a great deal of time
and care to realize, several participants
volunteered to initiate efforts. That
work has begun with the 2009 SIGMOD
Programming Contest (db.csail.mit.
edu/sigmod09contest).
References
1. abiteboul, s. et al. the Lowell database research
self assessment. Commun. ACM 48, 5 (may 2005),
111–118.
2. austin, i. i.b.m. acquires cognos, maker of business
software, for $4.9 billion. New York Times (nov. 11,
2007).
3. bernstein, P.a. et al. the asilomar report on database
research. SIGMOD Record 27, 4 (Dec. 1998), 74–80.
4. bernstein, P.a. et al. future directions in Dbms
research: the Laguna beach participants. SIGMOD
Record 18, 1 (mar. 1989), 17–26.
5. silberschatz, a. and Zdonik, s. strategic directions
in database systems: breaking out of the box. ACM
Computing Surveys 28, 4 (Dec. 1996), 764–778.
6. silberschatz, a., stonebraker, m., and ullman, J.D.
Database research: achievements and opportunities
into the 21st century. SIGMOD Record 25, 1 (mar.
1996), 52-63.
7. silberschatz, a., stonebraker, m., and ullman, J.D.
Database systems: achievements and opportunities.
Commun. ACM 34, 10 (oct. 1991), 110–120.
correspondence regarding this article should be
addressed to Joseph M. Hellerstein (hellerstein@
cs.berkeley.edu).
© 2009 acm 0001-0782/09/0600 $10.00