Illustration by Paul Wearing
are especially prone to users “gaming”
the system, providing low-effort or
random answers in order to reduce the
time and work needed to get a reward.
How would you know whether a given
answer is a user’s honest opinion or
whether she is just clicking randomly?
My research partners and I first encountered this in 2007 when we tried to
get turkers to rate the quality of Wikipedia articles, which has been a difficult thing for researchers to do with
automated tools. With high hopes, we
posted a number of articles to Mechanical Turk, paying turkers $0.05 to make
judgments about their quality and to
write what improvements they thought
the article needed. We got results re-
markably quickly: 200 ratings within
two days.
But when we looked at the results,
it was obvious that many people were
not even doing the task. Almost half
the responses to the writing question
were blank, uninformative, or just copied and pasted, and about a third spent
less than a minute on the task—less
time than is needed to read the article,
let alone rate it. This was not a promising beginning.
Is there something we could do to
improve the quality of responses and
reduce gaming? In traditional set-
tings, there are a number of mecha-
nisms that can mitigate gaming. For
example, experimenters have long
known that being in the same room
as a participant can lead to better per-
formance by the participant, as she
knows the experimenter is monitoring
her. Social norms, sanctions, and a de-
sire to avoid looking bad can promote
higher quality contributions in groups
that know each other. The potential for
additional future work can motivate
high quality work in the present, as
can formal or informal reputation sys-
tems that enable future employers to
observe past performance. Job loss can
create hardships, with time needed to
find another job. Explicit contracts can
be enforced through legal systems,
causing high costs to those who do not
honor them.