Follow us on Twitter at http://twitter.com/blogCACM
The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.
there is more to be desired from a reviewing process.
Perhaps this means we should decrease the acceptance rate? Maybe, but
this makes sense only if you believe
arbitrariness is good, as it will almost
surely increase the arbitrariness. In the
extreme case where only one paper is
accepted, the odds of it being rejected
on re-review are near 100%.
Perhaps this means we should increase the acceptance rate? If all papers
submitted were accepted, the arbitrariness would be 0, but as mentioned earlier, arbitrariness of 0 is not the goal.
Perhaps this means NIPS is a broad
conference with substantial disagreement by reviewers (and attendees)
about what is important? Maybe. This
seems plausible to me, given anecdotal
personal experience. Perhaps small,
highly focused conferences have a
Perhaps this means researchers submit to an arbitrary process for historical
reasons? The arbitrariness is clear, the
reason less so. A mostly arbitrary review
process may be helpful in that it gives
authors a painful-but-useful opportunity
to debug easy ways to misinterpret their
work. It may also be helpful in that it rejects the bottom 20% of papers that are
actively wrong, and hence harmful to the
process of developing knowledge. These
reasons are not confirmed, of course.
Is it possible to do better? I believe
the answer is “yes,” but it should be
understood as a fundamentally difficult problem. Every program chair who
cares tries to tweak the reviewing process to be better, and there have been
January 8, 2015
Corinna Cortes (http://bit.
ly/18I9RTK) and Neil Lawrence (http://
bit.ly/1zy3Hjs) ran the NIPS experiment
( http://bit.ly/1HNbXRT), in which one-tenth of papers submitted to the Neural Information Processing Systems
Foundation (NIPS, http://nips.cc/) went
through the NIPS review process twice,
and the accept/reject decision was compared. This was a great experiment, so
kudos to NIPS for being willing to do it
and to Corinna and Neil for doing it.
The 26% disagreement rate presented at the NIPS conference (http://bit.
ly/18Iaj4r) understates the meaning in
my opinion, given the 22% acceptance
rate. The immediate implication is
that one-half to two-thirds of papers
accepted at NIPS would have been rejected if reviewed a second time. For
analysis details and discussion about
that, see http://bit.ly/1uRCqCF.
Let us give P (reject in 2nd review | accept 1st review) a name: arbitrariness.
For NIPS 2014, arbitrariness was ~60%.
Given such a stark number, the primary
question is “what does it mean?”
Does it mean there is no signal in the
accept/reject decision? Clearly not—a
purely random decision would have arbitrariness of ~78%. It is, however, notable that 60% is closer to 78% than 0%.
Does it mean the NIPS accept/reject
decision is unfair? Not necessarily. If a
pure random number generator made
the accept/reject decision, it would be
‘fair’ in the same sense that a lottery is
fair, and have an arbitrariness of ˜78%.
Does it mean the NIPS accept/reject
decision could be unfair? The numbers
make no judgment here. It is a natural
fallacy to imagine random judgments
derived from people imply unfairness,
so I would encourage people to withhold judgment on this question for now.
Is arbitrariness of 0% the goal?
Achieving 0% arbitrariness is easy:
choose all papers with an md5sum
that ends in 00 (in binary). Clearly,
The Arbitrariness of
DOI: 10.1145/2732417 http://cacm.acm.org/blogs/blog-cacm
Reviews, and Advice for
John Langford examines the results of the NIPS experiment,
while Mark Guzdial considers the role of class size in teaching