authors, or publishers of a database
have good ideas about how their data
should be cited. However, it is unlikely
that they will know how to associate a
citation with some complex SQL query,
and even less likely that the user of the
data, whose query was generated by
some user interface, will understand
what is wanted. In order to extract the
citation automatically from the query
Q and the database D, two questions
need to be answered:
˲ Does the citation depend on both Q
and D or just on the data Q(D) extracted
by Q from D?
˲ If we have appropriate citations for
some queries, can we use them to construct citations for other queries?
If the retrieved data is simply a
number or an image, one cannot
expect to find the citation in the retrieved data. Moreover, even if the
query returns nothing, it may be worthy of citation, but what citation is associated with the empty set? We need
at least context information; so we
need both Q and D.
The answer to the second question
is important because authors and publishers frequently have ideas as to how
to cite certain parts of the database;
that is, they can provide citations for
certain queries but do not know what
to do about other queries.
Numerous organizations2, 6, 12, 16 have
advocated data citation and developed
principles2–4, 7, 8, 12, 13, 15 that refine and
standardize the notion.
1, 3, 4, 8, 9, 18 The
purpose of these standards is mostly
to prescribe the information in a citation—the snippets—and also to define
A major, but not the only, purpose
of a citation is to identify the cited ma-
terial, and citation is often linked to
database? Here, we use the term “data-
base” in a broad sense and “query” to
mean any mechanism used to extract
the data, such as a set of file names, an
SQL query, a URL, or a special-purpose
GUI. The computational problem this
poses can be broadly and simply for-
Given a database D and a query Q,
generate an appropriate citation.
It is often the case that the curators,
Figure 1. GtoPdb family and introductory pages with independent citations.
Figure 2. The MODIS grid, with highlighted tiles (red) of spatial extent for California
(green), with citation.