can ask U to retype a word that an OCR
program has failed to recognize (the
“payment”), thereby contributing to a
CS effort on digitizing written text (that
is, system B). This is the key idea behind
the reCAPTCHA project.
34 The MOBS
project12, 13 employs the same solution.
In particular, it ran experiments where
a user U can access a Web site (such as
a class homepage) only after answering
a relatively simple question (such as, is
string “1960” in “born in 1960” a birth
date?). MOBS leverages the answers to
help build a data integration system.
This solution works best when the “
payment” is unintrusive or cognitively simple, to avoid deterring users from using
system A.
The fifth solution is to piggyback on
the user traces of a well-established system (such as building a spelling correction system by exploiting user traces of
a search engine, as discussed previously). This gives us a steady stream of users. But we must still solve the difficult
challenge of determining how the traces can be exploited for our purpose.
Once we have selected a recruitment strategy, we should consider
how to further encourage and retain
users. Many encouragement and retention (E&R) schemes exist. We briefly
discuss the most popular ones. First,
we can provide instant gratification, by
immediately showing a user how his or
her contribution makes a difference.
16
Second, we can provide an enjoyable experience or a necessary service, such as
game playing (while making a contribution).
32 Third, we can provide ways to
establish, measure, and show fame/trust/
reputation.
7, 13, 24, 25 Fourth, we can set up
competitions, such as showing top rated users. Finally, we can provide
ownership situations, where a user may feel
he or she “owns” a part of the system,
and thus is compelled to “cultivate”
that part. For example, zillow.com displays houses and estimates their market prices. It provides a way for a house
owner to claim his or her house and
provide the correct data (such as number of bedroomss), which in turn helps
improve the price estimation.
These E&R schemes apply naturally
to volunteering, but can also work well
for other recruitment solutions. For
example, after requiring a set of users
to contribute, we can still provide instant gratification, enjoyable experi-
Given the success
of current
crowdsourcing
systems,
we expect that
this emerging field
will grow rapidly.
ence, fame management, and so on, to
maximize user participation. Finally,
we note that deployed CS systems often
employ a mixture of recruitment methods (such as bootstrapping with “
requirement” or “paying,” then switching to “volunteering” once the system
is sufficiently “mature”).
What contributions can users
make? In many CS systems the kinds
of contributions users can make are
somewhat limited. For example, to
evaluate, users review, rate, or tag; to
share, users add items to a central Web
site; to network, users link to other users; to find a missing boat in satellite
images, users examine those images.
In more complex CS systems, however, users often can make a far wider
range of contributions, from simple
low-hanging fruit to cognitively complex ones. For example, when building a structured KB, users can add a
URL, flag incorrect data, and supply
attribute-value pairs (as low-hanging
fruit).
3, 5 But they can also supply inference rules, resolve controversial issues, and merge conflicting inputs (as
cognitively complex contributions).
25
The challenge is to define this range of
possible contributions (and design the
system such that it can gather a critical
crowd of such contributions).
Toward this goal, we should consider four important factors. First,
how cognitively demanding are the
contributions? A CS system often has
a way to classify users into groups,
such as guests, regulars, editors, admins, and “dictators.” We should take
care to design cognitively appropriate
contribution types for different user
groups. Low-ranking users (such as
guests, regulars) often want to make
only “easy” contributions (such as answering a simple question, editing one
to two sentences, flagging an incorrect data piece). If the cognitive load is
high, they may be reluctant to participate. High-ranking users (such as editors, admins) are more willing to make
“hard” contributions (such as resolving controversial issues).
Second, what should be the impact
of a contribution? We can measure the
potential impact by considering how
the contribution potentially affects
the CS system. For example, editing a
sentence in a Wikipedia page largely
affects only that page, whereas revis-