This collaboratively edited knowledgebase
provides a common source of data for
Wikipedia, and everyone else.
BY DENNY VRANDEČIĆ AND MARKUS KRÖTZSCH
according to its vision statement
wiki/Vision). There is no question
this must include data that can be
searched, analyzed, and reused.
It may be surprising that Wikipedia
does not provide direct access to most
of it, through either query services or
downloadable data exports. Actual use
of the data is rare and often restricted
to specific pieces of information (such
as geo-tags of Wikipedia articles used
in Google Maps). The reason for this
striking gap between vision and reality
is that Wikipedia’s data is buried in 30
million Wikipedia articles in 287 languages from which extraction is inherently very difficult.
This situation is unfortunate for
anyone wanting to use the data but is
also an increasing threat to Wikipedia’s main goal of providing up-to-date,
accurate, encyclopedic knowledge.
The same information often appears
in articles in many languages and in
many articles within a single language.
Population numbers for Rome, for example, can be found in English and
Italian articles about Rome but also in
the English article “Cities in Italy.” The
numbers are all different.
Wikidata aims to overcome such
inconsistencies by creating new ways
for Wikipedia to manage its data on
a global scale; see the result at http://
www.wikidata.org. The following essential design decisions characterize
the Wikidata approach.
Open editing. As in Wikipedia,
Wikidata allows every user to extend
and edit the stored information, even
without creating an account. A form-based interface makes editing easy.
UNNOTICED BY MOST of its readers, Wikipedia
continues to undergo dramatic changes, as its sister
project Wikidata introduces a new multilingual
“Wikipedia for data” ( http://www.wikidata.org)
to manage the factual information of the popular
online encyclopedia. With Wikipedia’s data
becoming cleaned and integrated in a single location,
opportunities arise for many new applications.
Originally conceived in 2001 as a mainly text-based
resource, Wikipedia1 has collected increasing
amounts of structured data, including numbers, dates,
coordinates, and many types of relationships, from
family trees to the taxonomy of species. It has become a
resource of enormous value, with potential applications
across all areas of science, technology, and culture. This
development is hardly surprising, given that Wikipedia
is committed to “a world in which every single human
being can freely share in the sum of all knowledge,”
˽ Wikidata provides a free collaborative
knowledgebase all can share.
˽ Wikidata has quickly become one
of the most active Wikimedia projects.
˽ Wikipedia, as well as an increasing
number of other sites, taps content from
Wikidata in every pageview, magnifying
the data’s visibility and usefulness.