ions. Typical research-related processes could be augmented or even
completely supplanted. For example,
researchers could automatically get
recommendations of papers and contacts based on what they are currently
doing; experts might be automatically
identified in a domain based on discussions around their papers and blog
entries; peer reviewing could evolve to
take into consideration the new social
media and Web-based interactions;
and even ‘impact factors’ for institutions might incorporate electronic
analysis of all types of information
and not just citations to publications
and research grants.
Our support of the myExperiment
( http://www.myexperiment.org/) project is a demonstration of our belief that
scientific collaboration and information sharing can be supported through
social networking. The myExperiment
project brings together social networks and workflows in a single information graph—a data mesh—that can
be browsed, analyzed, and searched.
Conclusion
As researchers and scientific instruments can now produce and publish
large amounts of data and information more easily than at any other
point in history, there is an increasing
requirement for automation tools to
help manage and navigate the deluge
of research data. For example, projects
like Pan-STARRS (http://pan-starrs.ifa.
hawaii.edu/) and the HLC (http://lhc.
web.cern.ch/) will generate many peta-bytes of data. The emergence of folk-sonomies on the Web is one example
of how user-driven categorization can
help with information discovery. The
there is an increasing
requirement for
automation tools
to help manage and
navigate the deluge
of research data.
We need to
invest significant
resources to
making the semantic
computing vision
a reality.
need to deal with meaningful and rel-
evant information within the context
of one’s actions is growing. There is an
immense opportunity for the research
community to bring its expertise and
experience together in accelerating
the development of semantic comput-
ing technologies. We need to invest
significant resources to making the se-
mantic computing vision a reality by:
investing in semantics-aware in- ˲
frastructure;
increasing awareness of the po- ˲
tential of semantics-based computing;
and
training more researchers on se- ˲
mantics-based computing and related
technologies.
The discussion on data meshes
shows the potential value of aggregating information in a (semi-)structured,
machine-interpretable manner. We
believe an ecosystem of desktop tools,
cloud services, and data formats will
emerge to support “information and
knowledge management,” namely, the
(automatic) acquisition, representation, aggregation, indexing, discovery,
consumption, correlation, management, and inference of information.
Doing so at scale would significantly
improve the way we discover and share
information and how we collaborate.
We have described a representative set of investments we are making
to ease the transition of researchers
toward a world where information is
produced and consumed in a structured and semantics-rich manner
(more information about the work and
research tools offered by Microsoft Research for scientists can be found at
http://research.microsoft.com/en-us/
collaboration/about/). However, this
will not happen instantly. There is a lot
of unstructured data out there already.
Data-mining technologies are necessary to automatically extract as much
semantically rich information as possible. For example, Microsoft’s Live
Labs has worked on machine learn-ing-based technologies to extract entities from the unstructured Web (see
http://livelabs.com/projects/entity-extraction/). The research world needs
similar technologies to be deployed at
scale that can aggregate, index, and
mine research-related information.
We believe such an ecosystem of
semantics-aware tools and services
will ultimately become the norm in our
day-to-day interactions with computers, constituting a global “smart cyberinfrastructure.” However, if the big
companies are to invest in implementing these ideas and technologies in
their offerings (products and services),
the research community must test and
demonstrate their potential as part of
the community’s attempt to build a
smart cyberinfrastructure for research.
Ultimately, this vision of a data mesh
and smart cyberinfrastructure will go
some way toward realizing the visions
of the early pioneers like Vannevar
Bush3 and J.C.R. Licklider.
8
References
1. Berners-Lee, T., Hendler, J.A., and Lasilla, O. The
Semantic Web. Scientific American (May 2001).
2. Borgman, C.L. Scholariship in the Digital Age:
Information, Infrastructure, and the Internet. Mi T
Press, 2007.
3. Bush, V. As we may think. The Atlantic Monthly (1945);
www.theatlantic.com/doc/194507/bush
4. Cycorp. Cyc Knowledge Base; www.cyc.com/
5. Dirks, L. and Hey, T., eds. CT Watch Quarterly: The
Coming Revolution in Scholarly Communications &
Cyberinfrastructure 3, 3 (Aug. 2007).
6. Fellbaum, C., ed. WordNet. The Mi T Press, 1998.
7. Helbig, H. Knowledge Representation and the
Semantics of Natural Language. Springer, Berlin,
2006.
8. Licklider, J. C. R. Libraries of the Future. Mi T Press,
1965.
9. Shadbolt, N., Berners-Lee, T., and Hall, W. The
Semantic Web revisited. IEEE Intelligent Systems 21,
3 (Mar. 2006), 96–101.
10. Wroe, C. et al. A suite of DAML+ OiL ontologies
to describe bioinformatics Web services and data.
International Journal of Cooperative Information
Systems 12 (2003), 197–224.
Savas Parastatidis ( Savas.Parastatidis@microsoft.
com) is a principal developer with Microsoft Technical
Computing and a visiting fellow at the University of
Newcastle upon Tyne, U.K.
Evelyne Viegas ( evelynev@microsoft.com) is a senior
program manager responsible for the Online Technologies
and Web Cultures initiative at Microsoft Research in
Redmond, WA.
Tony hey ( Tony.Hey@microsoft.com) is a corporate vice
president of the external Research Division of Microsoft
Research in Redmond, WA.