Figure 2. all data/information can be linked in a mesh through relationships. Common,
machine-processable formats are used to represent every aspect of a data mesh.
structured data at Web scale. Recent
success stories in the application of
knowledge representation to specific
domains, as in the myGrid (see http://
www.mygrid.org.uk/) work in BioInformatics research,
10 demonstrate the
potential benefits of semantic computing technologies. Here, we use the
term “data mesh” to encompass the
various concepts and approaches that
could be used or combined to support a semantics-rich ecosystem of research tools and services. It is not our
intention to suggest there would be
one single data mesh that would represent all human knowledge.
We expect a great number of vocabularies to emerge, many of which will
overlap, for representing every aspect
of a data mesh (such as geo-location,
mood, reviews, personal information,
domain-specific concepts and terms).
Ontologies will support an evolving
ecosystem of facts, vocabularies, and
relationships in specific domains.
We are already witnessing a plethora
of emerging efforts to standardize on
such vocabularies, such as microformats ( http://www.microformats.org/),
data portability (http://www.dataport-
ability.org/), gene ontology (http://
www.geneontology.org/), and others.
Programs will consume, combine,
and correlate everything in the universe of structured information and
help users reason over it. They will allow them to ask questions against this
(global) collection of facts—
information access policies permitting—such
as “Which is the most popular book
among my friends today?,” “Who is
the expert on aspect A of my business
workflow inside my organization?,”
“Have Evelyne and Savas been at the
same conference, at the same time in
any point in time?,” “What’s the degree of separation in terms of citations
between my paper and the seminal
work by Jim Gray?” and so on.
While data mesh instances can
be built in isolation (as in many of
today’s social networks), we believe
the potential value of aggregating all
of them and combining them in one
huge network of facts is tremendous.
This idea is similar to Tim Berners-Lee’s more recent rhetoric around
the ‘Giant Global Graph of Facts’ (see
http://dig.csail.mit.edu/breadcrumbs/
node/215). Please note that we are not
suggesting there would be a single repository of facts or that there would
even be universal agreement on what
is represented. We do expect, however,
to see machine-based technologies
We are already
witnessing the
emergence of data
mesh instances
on the Web,
especially as
they relate to
social networks.
that would be able to reason, many
times using probabilistic-based techniques, over the diverse set of facts.
We are already witnessing the emergence of data mesh instances on the
Web, especially as they relate to social
networks.Th e Zune Social (http://so-
cial.zune.net/) is an example of how a
social network can be combined with
information about music preferences,
recommendations, and an online marketplace. Facebook (http://www.face-
book.com/) is another example of how
connections between identities can
help in aggregating user-oriented preferences and then inferring behavior
and preference statistics. Finally, Pow-erset ( http://www.powerset.com/) is an
example of a search service that leverages existing structured information,
for example, Freebase (http://www.
freebase.com/) or generates it from unstructured sources (such as by applying
natural language processing technologies on Wikipedia content) to improve
the quality of the query results.
We believe that over time, a huge
ecosystem of services and tools will
emerge around data mesh instances.
Such tools and services will allow us to
move beyond current practice of information management by incorporating
more automation. Recommendation
engines will be the norm and our interactions with computers will always
be context-aware (for example, “since
the topic of the paper being written is
about botany, a query about ‘bush’ is
unlikely to be about a person’s name”
or “the search about papers on orchid
will take into consideration the opinion of people in the user’s professional
social network”). While today we can
search for information over the global
graph of linked Web pages consisting
of predominately unstructured data,
in the future we will be able to search
over all types of semantically enriched
information, which will in turn enable
a wide range of new applications to
emerge such as recommendation services, information management automation, information inferencing, and
so forth.
tools and Services to
Support Research
We believe the research community
will play a central role in supporting
and further evolving the semantic