3. Improve existing review processes.
Approaches listed in Table 3 have been
noted in Communications
commentaries. We are not endorsing them all.
Some conflict. Many could be used together. Some are in regular use in some
conferences; others have been tried
but did not take root.
Conference size is an important
variable in assessing utility. Some
prestigious conferences attract fewer
than 150 submissions and might have
a flat program committee. Some attract more submissions and enlist a
second tier of external (or ‘light’) reviewers. The largest can have three
levels, with tracks or subcommittees.
Potential comparisons increase non-linearly with submission number; approaches that work for one size may
not scale up or down.
Tracking submission histories. The
International Conference on Functional Programming allows authors
of papers previously rejected (by ICFP
or other conferences) to append the
review history and comments. Eleven
percent of ICFP 2011 authors declared
a history. Half of those provided annotated reviews. Half were re-reviewed by
one of the original reviewers, and all
except one were accepted. Although
mandating that authors report a paper’s prior submission history would
face practical and ethical challenges,
it can be a win-win for authors and reviewers when authors opt to do it.
table 3. Proposals for improving
conference reviewing outcomes.
track submission histories.
Streamline the review process.
Adopt double-blind reviewing.
Clarify review criteria.
improve reviewer match.
Control for reviewer differences.
Write more constructive reviews.
Reduce feedback to authors.
Allow author rebuttals.
Stage a shadow PC meeting.
Mentor or shepherd submissions.
Publish reviews.
improve presentation quality.
Streamlining. In phased reviewing,
submissions first get two or three reviews, then some are rejected and reviewers added to the others. Often all
results are announced together, but
EuroSys starting in 2009 has notified
the authors of rejected papers following
the first phase, as CSCW 2012 did after the first round, enabling authors to
quickly resume work. Conferences have
also experimented with various methods of ordering papers for discussion in
the committee: randomly, periodically
inserting highly rated or low-rated papers, starting at the top, starting at the
bottom. No consensus has emerged,
although some report that papers that
are first discussed late in the day tend to
fare poorly whatever their rating.
Double-blind reviewing. Evidence
indicates that author anonymity is
fairer, and this practice has spread.
To anonymize is sometimes awkward
for authors. Because anonymity can
inhibit program committee members
from finding duplication or extreme
incrementalism, some two-tier committees only blind the less influential
external reviewers.
Clarifying review criteria. Reviewers
have been asked to rate papers on diverse dimensions: originality, technical completeness, audience interest,
strengths and weaknesses, and so on.
In our experience, committees working under time pressure focus on the
overall rating; writing quality gets some
attention and nothing else does once
conferences reach a moderate size.
Matching reviewers to papers. In
general, the smaller the conference,
the more easily reviewer assignments
can be tuned. Keyword or topic area
lists are common. Matching semantic
analyses of a reviewer’s work to submissions has been tried. Some conferences let reviewers bid on submissions
based on titles and abstracts. CSCW
2013 authors could nominate ACs for
their papers; although no promises
were made, their choices were helpful.
IUI twice let program committee members choose which submissions to review; an absence of reviewer interest
was a factor in the final decision.
Normalizing to control for consistent-
ly negative or positive reviewers. Since
2006, Neural Information Processing
Systems (NIPS) has calculated a statisti-
cal normalization to offset consistently
high or low reviewer biases. Other con-
ferences have tried this, usually just
once. It does not counter biases direct-
ed at particular topics or methods, the
occasional reviewer who gives only top
and bottom ratings “to make a real dif-
ference,” or those who uniformly give
middling reviews. It does not produce
an advocate—knowing that review-
ers were inherently negative does not
replace harsh critiques with positive
points. Normalization may be more
useful for smaller conferences with
fewer papers to discuss. Another ap-
proach tried once by SIGCOMM (2006),
SOSP (2009), and other conferences
had reviewers rank submissions. Al-
though the rankings were used primar-
ily to create a discussion order, relative
judgments could counter a reviewer’s
overall positive or negative bias.