tion sources but search the Web for
relevant information in much the same
way a human user might when planning a vacation.
A major difficulty in realizing this
goal is that most Web content is primarily intended for presentation to
and consumption by human users;
HTML markup is primarily concerned
with layout, size, color, and other presentation issues. Moreover, Web pages
increasingly use images, often with active links, to present information; even
when content is annotated, the annotations typically take the form of natural-language strings and tags. Human users are (usually) able to interpret the
significance of such features and thus
understand the information being presented, a task that may not be so easy
for software agents.
This vision of a semantic Web is extremely ambitious and would require
solving many long-standing research
problems in knowledge representation and reasoning, databases, computational linguistics, computer vision,
and agent systems. One such problem
is the trade-off between conflicting requirements for expressive power in the
language used for semantic annotations and the scalability of the systems
used to process them7; another is that
integrating different ontologies may
prove to be at least as difficult as integrating the resources they describe.
18
Emerging problems include how to
create suitable annotations and ontologies and how to deal with the variable
quality of Web content.
iLLus Tra TioN by mia aNgELica baLaQuio T
Notwithstanding such problems,
considerable progress is being made
in the infrastructure needed to support
the semantic Web, particularly in the
development of languages and tools
for content annotation and the design
and deployment of ontologies. My aim
here is to show here that even if a full
realization of the semantic Web is still
a long way off, semantic Web technologies already have an important influence on the development of information technology.
might include the following unstructured text:
Harry Potter has a pet named Hedwig.
As it stands, it would be difficult or
impossible for a software agent (such
as a search engine) to recognize the fact
that this resource describes a young
wizard and his pet owl. We might try
to make it easier for agents to process
Web content by adding annotation
tags (such as Wizard and Snowy Owl).
However, such tags are of only limited
value. First, the problem of understanding the terms used in the text is
simply transformed into the problem
providing definitive information about
owls. RDF is a language that provides a
flexible mechanism for describing Web
resources and the relationships among
them.
14 A key feature of RDF is its use
of internationalized resource identifiers (IRIs)—a generalization of uniform
resource locators (URLs)—to refer to
resources. Using IRIs facilitates information integration by allowing RDF to
directly reference non-local resources.
IRIs are typically long strings (such
as
hogwarts.net/HarryPotter),
though abbreviation mechanisms are
available; here, I usually omit the prefix
and just write HarryPotter.
semantic annotation
The difficulty of sharing and processing Web content, or resources, derives
in part from the fact that much of it
(such as text, images, and video) is unstructured; for example, a Web page
of understanding the terms in the tags;
for example, a query for information
about raptors may not retrieve the text,
even though owls are raptors. Moreover, the relationship between Harry
Potter and Hedwig is not captured in
these annotations, so a query asking
for wizards having pet owls might not
retrieve Harry Potter.
We might also want to integrate
information from multiple sources;
for example, rather than coin our own
term for Snowy Owl, we might want to
point to the relevant term in a resource
RDF is a simple language; its underlying data structure is a labeled
directed graph, and its only syntactic
construct is the triple, which consists
of three components, referred to as
subject, predicate, and object. A triple
represents a single edge (labeled with
the predicate) connecting two nodes
(labeled with the subject and object);
it describes a binary relationship between the subject and object via the
predicate. For example, we might describe the relationship between Harry
and Hedwig using this triple: