in MODIS, when newer analysis software becomes available, the entire
database of products is reanalyzed,
yielding a complete new version; old
versions are not kept. While this is
undesirable from the standpoint of
provenance and reproducibility, the citation still carries useful information,
even though its referent may not exist.
We have addressed a critical issue in the
adoption of data citation—
automatically generating a citation from the query
and database that was used to obtain
the data. A preliminary implementation
of the rule-based citation language for
hierarchical data is given in Buneman
5 What we have described
here is quite general and applies to any
database with a well-defined query language. Rewriting queries through views
was originally developed for query optimization and subsequently exploited
in data integration. The idea of using
views for data citation bears some relationship to that of using them to define
security levels in a database.
Using database views to specify citable units is the key to both specifying
and generating citations. It is important
for data publishers who want their data
to be properly cited to define these views
and ensure the data necessary to generate the citation from them is available.
We have shown how this can be done for
two quite different scientific databases
and believe the idea can work on forms
of data (such as RDF25) and databases
in other fields, including the humani-ties. We have looked at some examples,
and the main barrier is that the data
needed to generate the citation may not
be available, either in the database or in
some metadata repository.l
In this article, we have focused on
the problem of automatically generat-
ing citations, but it is almost impos-
sible to do it in isolation from other
topics (such as citation standards). For
example, the citation snippets required
by the curators of our two examples
do not quite conform to the DataCite
9 although DataCite
has an entry for a spatial bounding
box, it does not have one for a temporal
interval as required by MODIS. A good
l See use cases and linked data sections in the an-
notated bibliography in the online appendix.
problem for database research is to de-
termine whether citations generated
by a rule are consistent with a given ci-
We also mentioned archiving (
ensuring fixity) and provenance as related computational challenges, but there are many
others. We have tacitly assumed a rather
conventional view of citations and how
they are used, but there are many ways in
which the form and use of citations may
change radically (such as papers with
10,000 authors or papers with 10,000
references). Maybe, by analogy with PageRank,
19 there should be some notion of
transitivity of credit in citation. These new
approaches to the content, structure, and
use of citations are all likely to require new
ideas from computer science.
Tony Harmar, who led the development
of GtoPdb, introduced us to the problem
of data citation. We are also indebted to
Sarah Cohen Boulakia, Jamie Davies,
Wenfei Fan, Andreas Rauber, Joanna
Sharman, Gianmaria Silvello, and the
reviewers for much useful input. This
work is supported by National Science
Foundation Information and Intelligent Systems grant 1302212 and Engineering and Physical Sciences Research
Council grant EP/J017728/1 The Theory
and Practice of Social Machines.
1. Altman, M. and King, G. A proposed standard for the
scholarly citation of quantitative data. D-Lib Magazine
13, 3/4 (Mar./Apr. 2007).
2. American Geophysical Union. AGU Publications Data
Policy. Washington, D,C., Dec. 2013; http://publications.
data-policy/ (accessed Nov. 2015)
3. American Meteorological Society. Data Archiving
and Citation. Boston, MA; https://www.ametsoc.org/
ams/ index.cfm/publications/authors/journal-and-bams-authors/journal-and-bams-authors-guide/data-archiving-and-citation/ (accessed Nov. 2015)
4. Ball, A. and Duke, M. How to Cite Datasets and Link to
Publications. The Digital Curation Centre, Edinburgh,
U. K., 2012; http://www.dcc.ac.uk/resources/how-guides/cite-datasets (accessed Nov. 2015)
5. Buneman, P. and Silvello, G. A rule-based citation
system for structured and evolving datasets. IEEE
Data Engineering Bulletin 33, 3 (2010), 33–41.
6. Coalition on Publishing Data in the Earth and Space
Sciences (COPDESS). Statement of commitment
from earth and space science publishers and data
facilities, 2015; http://www.copdess.org/statement-of-commitment/ (accessed Nov. 2015)
7. CODATA-ICSTI Task Group on Data Citation
Standards and Practices. Out of cite, out of mind: The
current state of practice, policy, and technology for the
citation of data. Data Science Journal 12, 0 (2013),
8. Data Observation Network for Earth (DataONE). Data
Citation and Attribution; https://www.dataone.org/
citing-dataone (accessed Nov. 2015)
9. DataCite. DataCite metadata schema for the
publication and citation of research data; http://
10. Deutsch, A., Popa, L., and Tannen, V. Query reformulation
with constraints. SIGMOD Record 35, 1 (2006), 65–73.
11. Fan, W., Chan, C.-Y., and Garofalakis, M. Secure XML
querying with security views. In Proceedings of the 2004
ACM SIGMOD International Conference on Management
of Data. ACM Press, New York, 2004, 587–598.
12. Federation of Earth Science Information Partners
(ESIP). Data Citation Guidelines for Data Providers and
13. FORCE11. Data Citation Synthesis Group: Joint
Declaration of Data Citation Principles; https://www.
force11.org/datacitation (accessed Nov. 2015)
14. Halevy, A. Y. Answering queries using views: A survey.
The VLDB Journal 10, 4 (2001), 270–294.
15. International Council for Science Committee on Data
for Science and Technology. Data Citation Standards
and Practices, 2010; http://www.codata.org/task-groups/data-citation-standards-and-practices
(accessed Nov. 2015)
16. Lawrence, B., Jones, C., Matthews, B., Pepler, S.,
and Callaghan, S. Citation and peer review of data:
Moving towards formal data publication. International
Journal of Digital Curation 6, 2 (2011), 4–37.
17. Lenzerini, M. Data integration: A theoretical
perspective. In Proceedings of the 21st ACM
SIGMOD-SIGACT-SIGART Symposium on Principles
of Database Systems (Madison, WI, June 3–6).
ACM Press, New York, 2002, 233–246.
18. McCallum, I., Plag, H.-P., and Fritz, S. GEOSS data
citation guidelines: Version 2.0, 2012; http://www.
V2.0.pdf (accessed Nov. 2015)
19. Page, L., Brin, S., Motwani, R., and Winograd, T. The
PageRank Citation Ranking: Bringing Order to the Web.
Technical Report 1999-66. Stanford InfoLab, Stanford
University, Nov. 1999.
20. Pawson, A. J., Sharman, J. L. et al. The IUPHAR/
BPS Guide to PHARMACOLOG Y: An expert-driven
knowledgebase of drug targets and their ligands.
Nucleic acids research 42, D1 (2014), D1098–D1106.
21. Pröll, S. and Rauber, A. Scalable data citation in
dynamic, large databases: Model and reference
implementation. In Proceedings of the 2013 IEEE
International Conference on Big Data (Santa Clara,
CA, Oct. 6–9). IEEE Press, 2013, 307–312.
22. Pröll, S. and Rauber, A. A scalable framework for
dynamic data citation of arbitrary structured data. In
Proceedings of the Third International Conference
on Data Management Technologies and Applications
(Vienna, Austria, Aug. 29–31, 2014), 223–230.
23. Research Data Alliance Working Group on Data
Citation. Making data citable: Case statement;
case-statement.html (accessed Nov. 2015)
24. Salomonson, V.V., Barnes, W., and Masuoka, E.J.
Introduction to MODIS and an overview of associated
activities. In Earth Science Satellite Remote Sensing:
Vol. 1: Science and Instruments. Springer, Berlin,
Heidelberg, Germany, 2006, 12–32.
25. Silvello, G. A methodology for citing linked open data
subsets. D-Lib Magazine 21, 1/2 (2015).
Peter Buneman ( firstname.lastname@example.org) is a professor in the
School of Informatics at the University of Edinburgh,
Edinburgh, U. K.
Susan Davidson ( email@example.com) is the Weiss
Professor of Computer and Information Science at the
University of Pennsylvania, Philadelphia, PA.
James Frew ( firstname.lastname@example.org) is an associate professor
of environmental informatics in the Bren School of
Environmental Science & Management at the University
of California, Santa Barbara, CA.
© 2016 ACM 0001-0782/16/09 $15.00
Watch the author discuss
her work in this exclusive