proach. A key challenge, then, is to obtain high-quality datasets from a process based on often-imperfect human
curators. We need to build platforms
that allow people to curate data easily and extend relevant applications to
incorporate such curation. For these
people-centric challenges, data provenance and explanation will be crucial,
as will privacy and security.
Data consumers. People want to use
messier data in complex ways, raising
many challenges. In the enterprise,
data consumers usually know how to
ask SQL queries, over a structured database. Today’s data consumers may
not know how to formulate a query
at all, for example, a journalist who
wants to “find the average temperature of all cities with population over
100,000 in Florida” over a structured
dataset. Enabling people to get such
answers themselves requires new
query interfaces, for example, based
on multi-touch, not just console-based SQL. We need multimodal interfaces that combine visualization,
querying, and navigation. When the
query to ask is not clear, people need
other ways to browse, explore, visualize, and mine the data, to make data
Online communities. People want to
create, share, and manage data with
other community members. They may
want to collaboratively build commu-nity-specific knowledge bases, wikis,
and tools to process data. For example,
many researchers have created their
own pages on Google Scholar, thereby contributing to this “community”
knowledge base. Our challenge is to
build tools to help communities produce usable data as well as to exploit,
share, and mine it.
In addition to research challenges, the
database field faces many community
issues. These include database education, data science, and research culture.
Some of these are new, brought about
by big data. Other issues, while not new,
are exacerbated by big data and are becoming increasingly important.
Database education. The database
technology taught in standard database
courses today is increasingly disconnected from reality. It is rooted in the
1980s, when memory was small relative
to database size, making I/O the bottle-
neck to most database operations, and
when servers used relatively expensive
single-core processors. Today, many
databases fit in main memory, and
many-core servers make parallelism
and cache behavior critical to database
performance. Moreover, although SQL
DBMSs are still widely used, so are key-
value stores, data stream processors,
and MapReduce frameworks. It is time
to rethink the database curriculum.
Data science. As we discussed earlier, big data has generated a rapidly
growing demand for data scientists
who can transform large volumes of
data into actionable knowledge. Data
scientists need skills not only in data
management, but also in business intelligence, computer systems, mathematics, statistics, machine learning,
and optimization. New cross-disci-plinary programs are needed to provide this broad education. Successful
research and educational efforts related to data science will require close
collaboration with these other disciplines and with domain specialists.
Big data presents computer science
with an opportunity to influence the
curricula of chemistry, earth sciences,
sociology, physics, biology, and many
other fields. The small computer science parts of those curricula could
be grown and redirected to give data
management and data science a more
Research culture. Finally, there is
much concern over the increased emphasis of citation counts instead of
research impact. This discourages
large systems projects, end-to-end tool
building, and sharing of large datasets,
since this work usually takes longer
than solving point problems. Program
committees that value technical depth
on narrow topics over the potential
for real impact are partly to blame. It
is unclear how to change this culture.
However, to pursue the big data agenda effectively, the field needs to return
to a state where fewer publications per
researcher per time unit is the norm,
and where large systems projects, end-to-end tool sets, and data sharing are
more highly valued.
This is an exciting time for database re-
search. In the past it has been guided
by, but also restricted by, the rigors of
the enterprise and relational database
systems. The rise of big data and the
vision of a data-driven world present
many exciting new research challenges
related to processing big data; handling
data diversity; exploiting new hardware,
software, and cloud-based platforms;
addressing the data life cycle, from cre-
ating data to analyzing and sharing it;
and facing the diversity, roles, and num-
ber of people related to all aspects of
data. It is also time to rethink approach-
es to education, involvement with data
consumers, and our value system and
its impact on how we evaluate, dissemi-
nate, and fund our research.
Acknowledgments. We thank the
reviewers for invaluable suggestions.
The Beckman meeting was supported by donations from the Professor Ram Kumar Memorial Foundation, Microsoft Corporation, and
1. Abiteboul, S. et al. The Lowell database research
self-assessment. Commun. ACM 48, 5 (May 2005),
2. Agrawal, R. et al. The Claremont report on database
research. Commun. ACM 52, 6 (June 2009), 56–65.
3. Apache Software Foundation. Apache Hadoop; http://
hadoop.apache.org, accessed Sept. 12, 2014.
4. Apache Software Foundation. Apache Hive; http://hive.
apache.org, accessed on Nov. 9, 2014.
5. Apache Software Foundation. Apache Pig; http://pig.
apache.org, accessed on July 4, 2014.
6. Bernstein, P. et al. Future directions in DBMS
research—The Laguna Beach participants. ACM
SIGMOD Record 18, 1 (1989), 17–26.
7. Bernstein, P. et al. The Asilomar report on database
research. ACM SIGMOD Record 27, 4 (1998), 74–80.
8. [C11] Cattell, R. Scalable SQL and NoSQL data stores.
SIGMOD Record 39, 4 (2011), 12–27.
9. Dean, J. and Ghemawat, S. MapReduce: Simplified
data processing on large clusters. Commun. ACM 51,
1 (2008), 107–113.
10. Silberschatz, A. et al. Strategic directions in database
systems—breaking out of the box. ACM Computing
Surveys 28, 4 (1996), 764–778.
11. Silberschatz, A., Stonebraker, M. and Ullman, J.D.
Database systems: Achievements and opportunities.
Commun. ACM 34, 10 (Oct. 1991), 110–120.
12. Silberschatz, A., Stonebraker, M. and Ullman, J.D.
Database research: Achievements and opportunities
into the 21st century. ACM SIGMOD Record 25, 1
The following authors served as editors of this article
(the third author also served as corresponding author):
Philip A. Bernstein ( email@example.com) is
a Distinguished Scientist at Microsoft Research,
Michael J. Carey ( firstname.lastname@example.org) is a professor in
the Bren School of Information and Computer Sciences at
the University of California, Irvine.
AnHai Doan ( email@example.com) is a professor in the
Department of Computer Science at the University of
© 2016 ACM 0001-0782/16/2 $15.00