KBs in a CS fashion. The IWP project35
extracts structured data from the textual pages of Wikipedia, then asks users to verify the extraction accuracy.
The Cimple/DBLife project4, 5 lets users correct the extracted structured
data, expose it in wiki pages, then add
even more textual and structured data.
Thus, it builds structured “
community wikipedias,” whose wiki pages mix
textual data with structured data (that
comes from an underlying structured
KB). Other related works include YAG-ONAGA,
11 BioPortal,
17 and many recent
projects in the Web, Semantic Web,
and AI communities.
1, 16, 36
In general, building a structured KB
often requires selecting a set of data
sources, extracting structured data
from them, then integrating the data
(for example, matching and merging
“David Smith” and “D.M. Smith”). Users can help these steps in two ways.
First, they can improve the automatic
algorithms of the steps (if any), by editing their code, creating more training
data,
17 answering their questions12, 13 or
providing feedback on their output.
12, 35
Second, users can manually participate in the steps. For example, they can
manually add or remove data sources,
extract or integrate structured data, or
add even more structured data, data
not available in the current sources
but judged relevant.
5 In addition, a CS
system may perform inferences over
its KB to infer more structured data. To
help this step, users can contribute inference rules and domain knowledge.
25
During all such activities, users can
naturally cross-edit and merge one another’s contributions, just like in those
systems that build textual KBs.
Another interesting target prob-
lem is building and improving sys-
tems running on the Web. The project
Wikia Search ( search.wikia.com) lets
users build an open source search en-
gine, by contributing code, suggest-
ing URLs to crawl, and editing search
result pages (for example, promoting
or demoting URLs). Wikia Search was
recently disbanded, but similar fea-
tures (such as editing search pages)
appear in other search engines (such
as Google, mahalo.com). Freebase
lets users create custom browsing
and search systems (deployed at Free-
base), using the community-curated
data and a suite of development tools
(such as the Metaweb query language
and a hosted development environ-
ment). Eurekster.com lets users col-
laboratively build vertical search en-
gines called swickis, by customizing
a generic search engine (for example,
specifying all URLs the system should
crawl). Finally, MOBS, an academic
project,
12, 13 studies how to collabora-
tively build data integration systems,
those that provide a uniform query in-
terface to a set of data sources. MOBS
enlists users to create a crucial system
component, namely the semantic
mappings (for example, “location” =
“address”) between the data sources.