˲ ˲ The system can help complete
forms such as the PC member invitation form and the paper submission
form by suggesting likely colleagues
based on past collaboration history.
For these reasons, EasyChair and
EDAS are an immense contribution to
the academic community. According
to its Web page, EasyChair hosted over
3,300 conferences in 2010. Because of
its optimizations for multiconferences
and multitrack conferences, it is mandated for conferences and workshops
that participate in the Federated Logic
Conference (FLoC), a huge multicon-ference that attracts approximately
1,000 paper submissions.
Data Privacy concerns
Accidental or deliberate disclosure. A
privacy concern with cloud-comput-ing-based conference management
systems such as EDAS and EasyChair
arises because the system administrators are custodians of a huge quantity
of data about the submission and reviewing behavior of thousands of researchers, aggregated across multiple
conferences. This data could be deliberately or accidentally disclosed, with
unwelcome consequences.
˲ ˲ Reviewer anonymity could be compromised, as well as the confidentiality
of PC discussions.
˲ ˲ The acceptance success records
could be identified, for individual researchers and groups, over a period of
years; and
˲ ˲ The aggregated reviewing profile
(fair/unfair, thorough/scant, harsh/un-discerning, prompt/late, and so forth)
of researchers could be disclosed.
The data could be abused by hiring
or promotions committees, funding
and award committees, and more generally by researchers choosing collaborators and associates. The mere existence of the data makes the system
administrators vulnerable to bribery,
coercion, and/or cracking attempts. If
the administrators are also researchers, the data potentially puts them in
situations of conflict of interest.
The problem of data privacy in general is of course well known, but cloud
computing magnifies it. Conference
data is an example in our backyard.
When conference organizers had
to install the software from scratch,
there was still a risk of breach of con-
the acceptance
success records
could be identified,
for individual
researchers and
groups, over
a period of years.
fidentiality, but the data was just about
one conference. Cloud computing
solutions allow data to be aggregated
across thousands of conferences over
decades, presenting tremendous opportunities for abuse if the data gets
into the wrong hands.
Beneficial data mining. In addition
to the abuses of conference review data
described here, there are some uses
that might be considered beneficial.
The data could be used to help detect or
prevent fraud or other kinds of unwanted behavior, for example, by identifying:
˲ ˲Researchers who systematically
unfairly accept each other’s papers, or
rivals who systematically reject each
other’s papers, or reviewers who reject
a paper and later submit to another
conference a paper with similar ideas;
and
˲ ˲ Undesirable submission patterns
and behaviors by individual researchers (such as parallel or serial submissions of the same paper; repeated paper withdrawals after acceptance; and
recurring content changes between
submitted version and final version).
The data could also be used to understand and improve the way conferences
are administered. ACM, for example,
could use the data to construct quality
metrics for its conferences, enabling it
to profile the kinds of authors who submit, how much “new blood” is entering
the community, and how that changes
over different editions of the conference.
This could help identify conferences
that are emerging as dominant, or others that have outlived their usefulness.
The decisions about who is allowed
to mine the data, and for what purpos-
es, are difficult. Policies should be de-
cided transparently and by consensus,
rather than being left solely to the de
facto data custodians.
Ways forward
Policies and legislation. An obvious
first step is to articulate clear policies
that circumscribe the ways in which
the data is used. For example, a simple
policy might be that the data gathered
during the administration of a conference should be used only for the management of that particular conference.
Adherence to this policy would imply
that the data is deleted after the conference, which is not done in the case
of Easychair (I don’t know if it is done
for EDAS). Other policies might allow
wider uses of the data. Debate within
different academic communities can
be expected to yield consensus about
which practices are to be allowed in
a discipline, and which ones not. For
example, some communities may
welcome plagiarism detection based
on previously reviewed submissions,
while others may consider it useless for
their subject, or simply unnecessary.
Since its inception in 2002 and up to
the time of writing, EasyChair has appeared not to have any privacy policy,
or any statement about the purposes
and possible uses of the data it stores.
There is no privacy policy linked from
its main page, and a search for “privacy
policy” (or similar terms) restricted to
the domain “ easychair.org” does not
yield any results. I have been told that
new users are presented with a privacy
statement at the time of first signing
up to Easychair. I did not create a new
account to test this; regardless, the
privacy statement is not linked from
anywhere or later findable via search.
EDAS does have an easily accessed
privacy policy, which (while not watertight) appears to comply with the “use
only for this conference” principle.
Another direction would be to try
to find alternative custodians for the
data—custodians that are not themselves also researchers participating
actively in conferences. The ACM or
IEEE might be considered suitable,
although they contribute to decisions
about publications and appointments
of staff and fellows. Professional data
custodians such as Google might also
be considered. It may be difficult to
find an ideal custodian, especially if
cost factors are taken into account.