ect ( http://emeryberger.com/research/
automan/),
3 as well as both academic
and commercial efforts to automate
workflow in crowdsourcing and social
computing systems (see, for example,
http://groups.csail.mit.edu/uid/turkit/
and http://www.crowdflower.com/).
We note the organizational schemes
in most of the successful crowdsourcing examples to date share much in
common. The tasks to be performed
(for example, building an online encyclopedia, labeling images for their
content, creating a network of website
bookmark labels, finding surveillance
balloons) are obviously parallelizable,
and furthermore the basic unit of human contribution required is extremely
small (fix some punctuation, label an
image, and so on). Furthermore, there
is usually very little coordination required between the contributions. The
presence of these commonalities is a
source of optimism for the Crowdsourcing Compiler—so far, there seems to
be some shared structure to successful crowdsourcing that the compiler
might codify. But are such commonalities present because they somehow
delineate fundamental limitations on
successful crowdsourcing—or simply
because this is the “low-hanging fruit?”
Today, the Crowdsourcing Compiler
is clearly a “blue sky” proposal meant
more to delineate an ambitious research agenda for social computation
than to serve as a guide to short-term
steps. But we believe that such an agenda would both need and drive research
on theoretical foundations. First steps
toward developing the mathematical
foundations of a Crowdsourcing Compiler include formally addressing the
following questions:
˲ For a given set of assumptions
about the volunteer force, and given
the nature of the task, what is the best
scheme for organizing the volunteers
and their contributions? For instance,
is it a “flat” scheme where all contributors are equal and their contributions
are combined in some kind of majority
vote fashion? Or is it more hierarchical,
with proven and expert contributors
given higher weight and harder subproblems? Which of these (or other)
schemes should be used under what assumptions on the nature of the task and
what assumptions on the volunteers?
˲ How can we design crowdsourced
systems for solving tasks that are much
more challenging and less “transac-
tional” than what we currently see in
the field—for instance, complex prob-
lems where there are strong constraints
and interdependencies between the
contributions of different volunteers?
Behavioral research in recent years has
shown that groups of humans can in-
deed excel on such tasks,
31 but we are
far from understanding when and why.
Finally, we note that while the comparison to traditional compilers might
be a useful guide and metaphor, a
crowdsourcing analogue would have
to face a variety of issues that simply
do not arise with standard hardware
and software. In addition to the aforementioned challenges of deciding how
to organize and incentivize human
contributions, there may also be the
potential for malicious or deceptive
behavior by workers, and the need for
error correction of crowd work (which
is currently largely handled by redundancy and voting techniques).
Challenges to Overcome
We have argued that mathematical research has the potential to make great
contributions to social computing.
However, before this potential is fully
realized, there are several challenges
that must be addressed.
Blending mathematical and experimental research. Mathematical and experimental research are complementary and both are needed to develop
relevant mathematical foundations
for social computing. The strengths of
mathematical work include:
1.Mathematical modeling and
analysis can be used to cleanly formulate and answer many questions about
system behavior without requiring that
we build a complete system, providing
us with a tool to evaluate the impact
of design decisions before committing to any particular design. For example, such models can provide guidance on how to increase participation
(such as, comparing a leaderboard to
badges16, 27), predict whether a social
computing system will achieve critical
mass, and perhaps understand how
the behavior of groups of users change
as the system scales.
2. Mathematical guarantees are de-
sirable for properties like user privacy
(that can be obtained, for example, us-
ing techniques from the extensive and
growing literature on differential priva-
cy11), correctness of a system’s output,
or the scalability of a social computing
system.
3. Theoretical work in computer sci-
ence provides tools for designing and
analyzing new algorithms that could
lie at the heart of social computing ap-
plications, answering questions like
how to aggregate noisy and unstruc-
tured estimates or information from
crowds,
25, 30 how to optimally divide a
community into subgroups, or how to
bring people together in moments of
spare time to achieve a common goal.
4. Mathematical models can be
used to explore counterfactual analy-
sis, something that is notoriously dif-
ficult to do through experiments alone.
Needless to say, mathematical mod-
eling should not and cannot replace
experimental work. A mathematical
theory can only be truly tested through
experiments, and discrepancies be-
tween the theory and experimental re-
sults provide guidance about how to
revise the theory. For example, the abil-
ity of mathematical models to make
valuable predictions about system be-
havior depends on an accurate model
of system users, which is generally best
developed through experimental work.
Learning from the social sciences.
Computer scientists cannot develop
the mathematical foundations of social
computing in isolation. Social comput-
ing systems are fundamentally social.
These systems cannot be properly mod-
eled or analyzed without accounting
for the behavior of their human com-
ponents. Much of the literature thus
far uses standard models of economic
agents and corresponding assumptions
about agent preferences, but a growing
literature based on experimental work
on online platforms suggests that hu-
man behavior in several online settings
might deviate from these models,
26, 35, 40
and these deviations can have signifi-
cant consequences for how to optimally
design social computing systems.
13, 26
In order for mathematical foundations to provide useful practical results,
it is necessary to base it on models that
better reflect human behavior. This is
most effectively achieved via a dialog between theoretical and experimental and
empirical research, with studies of human behavior informing mathematical