coverage as well. 1, 4, 10, 11, 18
Meho and Rogers10 concluded in
2008 that choosing WoS or Scopus
did not have a significant effect on
the citation-based ranking of human-computer interaction researchers.
However, in a case study involving
library and information researchers,
Meho and Yang11 observed the opposite, finding that conclusions drawn
for one scientific domain cannot be
generalized to other domains.
Complementing coverage studies,
this article explores the inaccuracy
of citation records, along with their
effect on the perceived impact of CS
conferences and on author ranking.
Figure 1 outlines the difference between coverage and accuracy for an
author of a book B and a journal article J1, with their citations visualized
at the top of the figure. B is cited by
another article J2 and by a technical
report TR. J1 is cited by a conference
paper C. TR is a preliminary version
of C, with the same title and authors.
The list of references in C is shorter
than TR, so TR cites B, but C does not
cite B.
The middle segment of the figure
reflects GS citation records, with GS
mistakenly attributing the citations
by TR to C. GS also covers publications of lesser importance (such as
books). Due to its erroneous and for
certain policies irrelevant records,
the GS citation count in the example
is not reliable.
WoS records are visualized in the
bottom segment of the figure. The
first observation is that WoS does not
index less-important manuscripts
(such as TR and B). However, WoS
Figure 1. Publications (vertices) and
citations (edges) recorded by databases.
W eb of Science
publication reality
B
J1
Google Scholar
B
J1
J1
J2
C TR
J2
C
C
J2
undercitation in
some databases
seems to be caused
mostly by their use
of inferior parsing
technology.
sometimes does keep track of their citations by indexed papers (such as the
citation of B by J2). With the manual-count method of Meho and Rogers, 10
these citations can still be counted,
should citation-analysis policy demand it. Second, the citations of B
by C and of J1 by J2 were never added
to WoS. For policies that neglect citations by papers not indexed in WoS,
the missing citation of B does not
matter, but the missing citation of J1
always matters. Both the citing and
cited papers have the WoS stamp of
approval, so the citation should be
counted. But when for some reason
the database lacks a correct record
of the citation, as in this example, it
is not counted, and the author suffers
professionally from undercitation.
The study by Meho and Yang11 on library and information researchers
said 0.5%, 4.4%, and 12% of relevant
citations were, at the time of the study
missing from GS, Scopus, and WoS,
respectively, due to database errors.
Here, we evaluate undercitation
resulting from such an error. Complementing the studies mentioned earlier, we recently uncovered a significant undercitation bias in Scopus and
WoS against covered CS conferences,
demonstrating how it weakens the CS
community’s effort to win greater appreciation for conference papers. We
also found how variations in undercitation of individual authors make the
ACM Digital Library (DL), Scopus, and
WoS unreliable information sources
for citation-based metrics. We also
present an automated method that
combines the coverage of GS with
the quality assurance of Scopus and
WoS to detect undercitation resulting
from missing citations.
We do not question Scopus or WoS
coverage. The analyses we perform for
any such database involve only publications indexed in that database.
Hence all undercitation results presented here are independent of database coverage. Moreover, we do not
take a position for or against citation-based metrics, though their usefulness has been questioned, 12 and many
refinements have been proposed. 13, 14
Our results demonstrate only that unless a corrective method is used, as
we do here, to correct raw counts obtained from Scopus and WoS, their in-