tologies. This is in part a result of free
standards-based tools for creating ontologies such as Protégé and SWOOP;
just as we mentioned with schema option paralysis, the availability of tools
shapes the languages people use and
the choices they make. The relative
unfamiliarity and the vague “hipness”
of the “ontology” world, however, can
give XML users anxiety about their ability to adjust to the RDF/OWL world with
more rigorous semantics. As a result,
they often overcompensate:
Ontology overkill. Operating in an
environment that focuses on semantics, victims of ontology overkill tend
to overmodel semantics, creating abstractions and associations that are
of little value to the application but
make the model much harder to understand and use. Ontology overkill
forces its sufferers not only to overmodel, but also often to fail at doing
so, because it is much harder to define
an ontology (in its fullest sense), and
to identify, understand, and validate
all its implications, than it is to define
a controlled vocabulary.
If XML fever sufferers come in contact with communities where Semantic Web ideas are widespread and well
established, they quickly discover that
most of their knowledge acquired in
the basic and intermediate phases of
the XML learning curve does not apply anymore. The reason for this is that
the Semantic Web creates a completely
self-contained world on top of other
Web technologies, with the only intersection being the fact that resources are
identified by URI. As a result, Semantic
Web users become blissfully unaware
that the Web may have solutions for
them or that there could be a simpler
way of solving problems. Seeing the
Semantic Web as the logical next step
of the Web’s evolution, we can observe
the following condition:
Web blindness. This is a condition
in which the victim settles into the
Semantic Web to a degree where the
non-Semantic Web does not even exist
anymore. In the pure Semantic Web,
lower-level technologies no longer
need to evolve, because every problem
can be solved on the semantic layer.
Web blindness victims often are only
dimly aware that many problems in
the real world are and most likely will
be solved with technologies other than
Semantic Web technologies.
If victims of Web blindness have
adjusted to their new environment of
abundant RDF and start embracing the
new world, they may come in contact
with applications that have aggregated
large sets of RDF data. While RDF triples are a seemingly simple concept,
the true power of RDF lies in the fact
that these triples are combined to form
interconnected graphs of statements
about things, and statements about
statements, which quickly makes it
impossible to use this dataset without
specialized tools. These tools require
specialized data storage and specialized languages for accessing these
stores. Handling these large sets of
data is the leading cause of an RDF-specific ailment:
Triple shock. While RDF itself is simple, large datasets easily contain millions of triples (for truly large datasets
this can go up to billions), and managing and querying such a big dataset
can become a considerable challenge.
If the schema of these large datasets
is simple, but ontology overkill has set
in and it has been reformulated as an
ontology, handling this dataset may
become considerably harder, without
any immediate benefit.
Semantic Web technologies may be
the correct choice for projects requiring
fully developed ontologies, but Semantic
Web technologies have little to do with
the plain Web and XML. This means that
neither should be regarded as a cure for
basic or intermediate XML fevers, and
that each has its own set of issues, which
are only partially listed here.
the Prescription
We probably cannot prevent these varieties of XML fever, especially the basic
strains, because it is undoubtedly a result of the hype and overbroad claims
for XML that many people try it in the
first place. We can do a better job of
inoculating XML novices and users
against the intermediate and advanced
strains, however, by teaching them that
the appropriate use of XML technologies depends on the nature and scope
of the problems to which they are applied. Heavyweight XML specifications
such as those developed by OASIS,
OMG, and other standards organizations are necessary to build robust enterprise-class XML applications, and
Semantic Web concepts and tools are
prerequisites for knowledge-intensive
computation, but more lightweight approaches for structuring and classifying information such as microformats
will do in other contexts.
When someone first learns about it,
XML may seem like the hammer in the
cliché about everything looking like a
nail. Those of us who teach XML, write
about it, or help others become effective users of it, however, can encourage
a more nuanced view of XML tools and
technologies that portrays them as a set
of hammers of different sizes, with a variety of grips, heads, and claws. We need
to point out that not everyone needs a
complete set of hammers, but information architects should know how to
select the appropriate hammer for the
kind of hammering they need to do.
And we should always remember that
pounding nails is only one of the tasks
involved in design and construction.
XML has succeeded beyond the
wildest expectations as a convenient
format for encoding information in
an open and easily computable fashion.
But it is just a format, and the difficult
work of analysis and modeling information has not and will never go away.
References
1. Bell, A. E. Death by UML fever. ACM Queue 2, 1 (Mar.
2004), 72-80.
2. Berners-Lee, T., Hendler, J. A., Lassila, O. The Semantic
Web. Scientific American 284, 5 (May 2001), 34-43.
3. Bray, T. On language creation. In Proceedings of XML
2005 (Atlanta, GA, Nov. 2005).
4. Bray, T., Paoli, J., Michael Sperberg-McQueen, C.
Extensible markup language (xML) 1.0. World
Wide Web Consortium, Recommendation REC-
xml-19980210 (Feb. 1998).
Erik Wilde ( dret@berkeley.edu) is a visiting assistant
professor in the School of Information at the University of
California at Berkeley, where he is also technical director
of the Information and Service Design program.
Robert J. Glushko ( glushko@ischool.berkeley.edu) is
an adjunct professor at the University of California at
Berkeley in the School of Information, the director of the
Center for Document Engineering, and one of the founding
faculty members of the Information and Service Design
program.