of computer use. However, in many of
these courses, the Web itself is treated as a specific instantiation of more
general principals. In other cases, the
Web is treated primarily as a dynamic
content mechanism that supports the
social interactions among multiple
browser users. Whether in CS studies
or in information-school courses, the
Web is often studied exclusively as the
delivery vehicle for content, technical
or social, rather than as an object of
study in its own right.
Here, we present the emerging interdisciplinary field of Web science5,
6 taking the Web as its primary object
of study. We show there is significant
interplay among the social interactions enabled by the Web’s design, the
scalable and open applications development mandated to support them,
and the architectural and data requirements of these large-scale applications
(see Figure 1). However, the study of
the relationships among these levels
is often hampered by the disciplinary
boundaries that tend to separate the
study of the underlying networking
from the study of the social applications. We identify some of these relationships and briefly review the status
of Web-related research within computing, We primarily focus on identifying emerging and extremely challenging problems researchers (in their role
as Web scientists) need to explore.
What is it?
Where physical science is commonly
regarded as an analytic discipline that
aims to find laws that generate or explain observed phenomena, CS is predominantly (though not exclusively)
synthetic, in that formalisms and algorithms are created in order to support
specific desired behaviors. Web science
deliberately seeks to merge these two
paradigms. The Web needs to be studied and understood as a phenomenon
but also as something to be engineered
for future growth and capabilities.
At the micro scale, the Web is an infrastructure of artificial languages and
protocols; it is a piece of engineering.
However, it is the interaction of human
beings creating, linking, and consuming information that generates the
Web’s behavior as emergent properties at the macro scale. These properties often generate surprising proper-
a large-scale
system may
have emergent
properties not
predictable by
analyzing micro
technical and/or
social effects.
ties that require new analytic methods
to be understood. Some are desirable
and therefore to be engineered in;
others are undesirable and if possible
engineered out. We also need to keep
in mind that the Web is part of a wider
system of human interaction; it has
profoundly affected society, with each
emerging wave creating new challenges and opportunities in making information more available to wider sectors
of the population than ever before.
It may seem that the best way to understand the Web is as a set of protocols
that can be studied for their properties,
with individual applications analyzed
for their algorithmic properties. However, the Web wasn’t (and still isn’t)
built using the specify, design, build,
test development cycle CS has traditionally viewed as software engineering
best practice.
Figure 2 outlines a new way of looking at Web development. A software
application is designed based on an
appropriate technology (such as algorithm and design) and with an envisioned “social” construct; it is indeed
a contradiction in terms to talk about
a Web application built for a single
user on a single machine. The system
is generally tested in a small group
or deployed on a limited basis; the
system’s “micro” properties are thus
tested. In some cases, when more and
more people accept the micro system,
accelerating “viral” scaling occurs. For
example, when Mosaic, the first popular Web browser, was released publicly
in 1992, the number of users quickly
grew by several orders of magnitude,
with more than a million downloads
in the first year; for more recent examples, consider photo-sharing on Flickr,
video-uploading on YouTube, and so-cial-networking sites like mySpace and
Facebook.
The macro system, that is, the use
of the micro system by many users interacting with one another in often-un-predicted ways, is far more interesting
in and of itself and generally must be
analyzed in ways that are different from
the micro system. Also, these macro
systems engender new challenges that
do not occur at the micro scale; for example, the wide deployment of Mosaic
led to a need for a way to find relevant
material on the growing Web, and thus
search became an important applica-