Figure 1: With Viz Wiz, blind people take photos using their mobile phones and
submit them alongside a question, spoken orally into the phone, shown here above
each image. A crowd of anonymous users reply, shown below, with response time
given in seconds in parentheses.
ers answer questions they have about
things around them that they cannot
see. The blind person takes a photograph with a smartphone’s camera,
records a spoken question (also using
the phone), and then uploads the query
and picture to a crowd of sighted users
on the net who are better able to ans wer
it (see Figure 1). For example, if a blind
person grabs a can out of her cupboard
but has forgotten what’s inside it, she
can snap a photo of the can and its label, upload it, and ask the sighted users
what’s in the can.
A related system, Sinch [ 7], draws on
the crowd to provide assistance to web-enabled mobile device users who have
situational disabilities, such as the
limited ability to read a small screen,
arthritis or hand tremors that make it
difficult to click on small web page targets, and slow networks. With Sinch,
the mobile users speak a question into
their phone and the crowd searches
the web for answers, using their more
capable desktop web access, and returning web pages with the requested
information highlighted.
Another reason to use a crowd is
the “many eyes” principle, which has
been claimed as an advantage of open-source software development (the
complete phrase is “many eyes make
bugs shallow”). We have exploited this
principle in Soylent [ 2], a Microsoft
Word extension that uses a crowd for
proofreading, shortening, and repetitive editing. A typical run of Soylent
may have dozens of people looking at
each paragraph of a document, finding
errors that a single writer might miss.
In fact, a conference paper submitted
about Soylent contained a grammatical error that was overlooked by not
only Word’s built-in grammar checker,
but also eight authors and six reviewers. However, when we passed the paper through Soylent, the crowd caught
the error.
A corollary of the many eyes principle is diversity. The fact is, a crowd
comprises a wide range of ideas, opinions, and skills. For example in Soylent,
the system not only indentifies writing
errors, but also suggests multiple ways
to fix them. It can suggest text to cut
to save space—a tough task even for
skilled authors, who are often reluctant to make cuts. Soylent can typi-
What color is this pillow?
(89s) I can’t tell.
(105s) multiple shades
of soft green, blue and
gold
What denomination is
this bill?
Do you see picnic tables
across the parking lot?
What temperature is my
oven set to?
Can you please tell me
what this can is?
What kind
this c
(24s) 20
(29s) 20
(13s) no
(46s) no
(69s) it looks like 425
degrees but the image
is difficult to see.
(84s) 400
(122s) 450
(183s) chickpeas.
(514s) beans
(552s) Goya Beans
(91s) En
(99s) no c
picture
(247s) ene
cally trim text down to 85 percent of its
original length, without changing the
meaning of the text or introducing errors (see Figure 2).
PROGRAMMING
Prototyping a human computation system is hard if you have to entice a crowd
to visit your website. Games With a Purpose handles this by making the experience fun—but not all human computation systems are fun enough to
be self-motivating, particularly at the
prototyping stage. Mechanical Turk is
a good prototyping platform for many
forms of human computation, because
it offers a ready service for recruiting a
crowd on demand. And the first prototypes for Viz Wiz and Soylent were built
on Mechanical Turk.
Yet thinking about programming
“A group of many
people has abilities
and knowledge that
one single end-user
does not... The fact
is, a crowd comprises
a wide range of
ideas, opinions, and
skills.”
with human beings inside the system
poses special problems. For example
with Mechanical Turk, a request for
a human to do a small task can take
a few minutes and cost a few cents to
get a result, which is astounding in one
sense (that you can obtain human assistance so quickly and so cheaply), but
is abysmally slow and expensive compared to a conventional function call.
Programmers need new tools that
can help them experiment with human
computation in their systems. For example, our TurKit toolkit [ 3] integrates
Mechanical Turk calls in a traditional
imperative/object-oriented programming paradigm, so that programmers
can write algorithms that incorporate
human computation in a familiar way.
TurKit does this using a novel programming model called “crash and
rerun,” which is suited to long-running
distributed processes where local computation (done by software) is cheap,
and remote work (done by humans) is
costly.
The insight of crash-and-rerun programming is that if our program crashes, it is cheap to rerun the entire program up to the place where it crashed.
This is true as long as rerunning does
not re-perform all the costly external
operations from the previous run. The
latter problem is solved by recording
information in a database every time a
costly operation is executed.
Costly operations are marked by a