big real-world networks like the Web,
as there is no full record of their evolution. Wikipedia now allows us to witness, and validate, preferential attachment at work on its graph.
conclusion
The usefulness of an online encyclopedia depends on multiple factors,
including breadth and depth of coverage, organization and retrieval interface, and trustworthiness of content.
In Wikipedia more depth eventually
translates into breadth, because the
Wikipedia style guidelines recommend
the splitting of overly long articles. The
evolution of articles and links in Wikipedia allows us to model the system’s
growth. Our finding that the ratio of incomplete vs. complete articles remains
constant yields a picture of sustainable
coverage and growth. An increasing
ratio would result in thinner coverage
and diminishing utility and a decreasing ratio of incomplete vs. complete
articles to eventual stagnation of Wikipedia growth.
The idea of growth triggered by
undefined references is supported by
our second finding—that most new articles are created shortly after a corresponding reference to them is entered
into the system. We also found that
new articles are typically written by different authors from the ones behind
the references to them. Therefore, the
scalability of the endeavor is limited
not by the capacity of individual contributors but by the total size of the
contributor pool.
Wikipedia’s incremental-growth
model, apart from providing an in-vivo validation of Barabási’s scale-free
network-development theory, suggests that the processes we have discovered may continue to shape Wikipedia in the future. Wikipedia growth
could be limited by invisible subjective boundaries related to the interests of its contributors. Our growth
model suggests how these boundaries
might be bridged. Consider that references to nonexistent entries prompt
creation of these entries and assume
that all human knowledge forms a
fully connected network. Wikipedia’s coverage will broaden through a
breadth-first graph traversal or flood-filling process, albeit over an uneven
time progression.
it turns out
that the reality
of Wikipedia’s
development is
located comfortably
between the
two extremes
of nonexistent
link inflation and
deflation.
How far might the Wikipedia process carry us? In Jorge Luis Borges’s
1946 short story “On Exactitude in Science,” the wise men of the empire undertake to create a complete map of the
empire; upon finishing, they realize the
map was so big it coincided with the
empire itself.
References
1. Barabási, A.-L. and Albert, R. Emergence of scaling in
random networks. Science 286, 5439 (Oct. 15 1999),
509–512.
2. Bryant, S., Forte, A., and Bruckman, A. Becoming
Wikipedian: Transformation of participation in a
collaborative online encyclopedia. In Proceedings of
the 2005 International ACM SIGGROUP Conference
on Supporting Group Work (Sanibel Island, FL, Nov.
6–9). ACM Press, New York, 2005, 1–10.
3. Capocci, A. Servedio, V., Colaiori, F., Buriol, L., Donato,
D., Leonardi, S., and Caldarelli, G. Preferential
attachment in the growth of social networks: The case
of Wikipedia. Physical Review E, 74, 036116 (2006).
4. Cunningham, W. and Leuf, B. The Wiki Way: Quick
Collaboration on the Web. Addison-Wesley, Boston,
MA, 2001.
5. Denning, P., Horning, J., Parnas, D., and Weinstein, L.
Wikipedia risks. Commun. ACM 48, 12 (Dec. 2005),
152–152.
6. Garfield, E. Citation Indexing: Its Theory and
Application in Science, Technology, and Humanities.
John Wiley & Sons, Inc., New York, 1979.
7. Giles, J. Internet encyclopaedias go head to head.
nature 438, 7070 (Dec. 15, 2005), 900–901.
8. Mehler, A. Text linkage in the wiki medium: A
comparative study. In Proceedings of the EACL 2006
Workshop on new Text: Wikis and Blogs and Other
Dynamic Text Sources (Trento, Italy, Apr. 6, 2006),
1–8.
9. Mitzenmacher, M. A brief history of generative models
for power law and lognormal distributions. Internet
Mathematics 1, 2 (2003), 226–251.
10. Remy, M. Wikipedia: The free encyclopedia. Online
Information Review 26, 6 (2002), 434.
11. Stvilia, B., Twidale, M., Smith, L., and Gasser, L.
Assessing information quality of a community–based
encyclopedia. In Proceedings of the International
Conference on Information Quality (Cambridge, MA,
Nov. 4–6, 2005), 442–454.
12. Viégas, F., Wattenberg, M., and Dave, K. Studying
cooperation and conflict between authors with history
flow visualizations. In Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems
(Vienna, Austria, Apr. 24–29). ACM Press, New York,
2004, 575–582.
13. Voß, J. Measuring Wikipedia. In Proceedings of the 10th
International Conference of the International Society
for Scientometrics and Informetrics (Stockholm, July
24–28, 2005), 221–231.
This work is partially funded by the European
Commission’s Sixth Framework Programme
under contract IST-2005-033331 “Software
Quality Observatory for Open Source Software
(SQO-OSS).”
Diomidis Spinellis ( dds@aueb.gr) is an associate
professor of information system technologies in the
Department of Management Science and Technology
at the Athens University of Economics and Business,
Athens, Greece.
Panagiotis Louridas ( louridas@acm.org) is a software
engineer in the Greek Research and Technology Network
and a researcher at the Department of Management
Science and Technology, Athens University of Economics
and Business, Athens, Greece.