least one correct guess, and with all incorrect guesses. We observed different
behavior at the three conferences: ASE
submissions were accepted at statistically the same rate regardless of reviewer guessing behavior. Additional data
available for ASE shows that for each review’s paper rating (strong accept, weak
accept, weak reject, strong reject), there
were no statistically significant differences in acceptance rates for submissions with different guessing behavior.
OOPSLA and PLDI submissions with no
guesses were less likely to be accepted
(p ≤ 0.05) than those with at least one
correct guess. PLDI submissions with
no guesses were also less likely to be accepted (p ≤ 0.05) than submissions with
all incorrect guesses (for OOPSLA, for
the same test, p = 0.57). One possible
explanation is that OOPSLA and PLDI
reviewers were more likely to affiliate
work they perceived as of higher quality
with known researchers, and thus more
willing to guess the authors of submissions they wanted to accept.
How do reviewers deanonymize?
OOPSLA and PLDI reviewers were asked
if the use of citations revealed the authors. Of the reviews with guesses, 37%
(11% of all reviews) and 44% (11% of all
reviews) said they did, respectively. The
ASE reviewers were asked what informed
their guesses. The answers were guessing based on paper topic ( 75 responses);
obvious unblinding via reference to
previous work, dataset, or source code
( 31); having previously reviewed or read
a draft ( 21); or having seen a talk ( 3).
The results suggest that some deanonymization may be unavoidable. Some
reviewers discovered GitHub repositories or project websites while searching
for related work to inform their reviews.
Some submissions represented clear extensions of or indicated close familiarity
with the authors’ prior work. However,
there also exist straightforward opportunities to improve anonymization. For example, community familiarity with anonymization, consistent norms, and clear
guidelines could address the incidence
of direct unblinding. However, multiple
times at the PC meetings, the PC chairs
heard a PC member remark about having been sure another PC member was a
paper author, but being wrong. Reviewers may be overconfident, and sometimes wrong, when they think they know
an author through indirect unblinding.
cept the Z reviewers for PLDI were statistically significantly correct less often
than the X and Y reviewers (p ≤ 0.05).
We conclude that reviewers who considered themselves experts were more
likely to guess author identities, but
were no more likely to guess correctly.
Are papers frequently poorly anonymized? One possible reason for deanonymization is poor anonymization.
Poorly anonymized papers may have
more reviewers guess, and also a higher
correct guess rate. Figure 3 shows the distribution of papers by the number of reviewers who attempted to guess the authors. The largest proportion of papers
(26%–30%) had only a single reviewer attempt to guess. Fewer papers had more
guesses. The bar shading indicates the
fractions of the author identity guesses
that are correct; papers with more guesses
have lower rates of incorrect guesses.
Combining the three conferences’ data,
the χ2 statistic indicates that the rates of
correct guessing for papers with one, two,
and three or more guesses are statistically
significantly different (p ≤ 0.05). This
comparison is also statistically signifi-
cant for OOPSLA alone, but not for ASE
and PLDI. Comparing guess rates (we
use one-tailed z tests for all population
proportion comparisons) between pa-
per groups directly: For OOPSLA, the
rate of correct guessing is statistically
significantly different between one-
guess papers and each of the other two
paper groups. For PLDI, the same is true
between one-guess and three-plus-guess
paper groups. This evidence suggests a
minority of papers may be easy to un-
blind. For ASE, only 1.5% of the papers
had three or more guesses, while for
PLDI, 13% did. However, for PLDI, 40%
of all the guesses corresponded to those
13% of the papers, so improving the ano-
nymization of a relatively small number
of papers would potentially significantly
reduce the number of guesses. Since the
three conferences only began using the
double-blind review process recently,
the occurrences of insufficient anony-
mization are likely to decrease as au-
thors gain more experience with anony-
mizing submissions, further increasing
double-blind effectiveness.
Are papers with guessed authors
more likely to be accepted? We investigated if paper acceptance correlated
with either the reviewers’ guesses or
with correct guesses. Figure 4 shows
the acceptance rate for each conference for papers without guesses, with at
Figure 3. Distributions of papers by number of guesses. The bar shading indicates the
fraction of the guesses that are correct.
30%
20%
10%
0%
27%
30%
26%
19%
13%
19%
8.7%
5.4%
1.5%
11 1 2
Number of Guesses
ASE OOPSLA PLDI
2 2 3+ 3+ 3+
π Incorrect guesses
π Correct guesses
Figure 4. Acceptance rate of papers by reviewer guessing behavior.
Papers with ASE OOPSLA PLDI
No guesses 21.2% 20.7% 6.8%
At least one correct guess 22.0% 31.6% 22.3%
All guesses incorrect 23.0% 25.0% 25.0%
Allpapers 21.3% 26.5% 16.7%