ies. Before we continue to describe the
more advanced strains of XML fever
that may result from these intermediate fevers and attempts to cure them,
it is important to point out that a good
way of avoiding them is to reuse existing XML languages, thus avoiding the
efforts and risks of inventing something new.
In an online follow-up to “On Language Creation,” 3 Tim Bray (one of the
creators of XML) says, “If you’re going
to be designing a new XML language,
first of all, consider not doing it.” This
is a very important point, because the
ubiquity of XML makes it likely that
for any given problem, somebody else
might have already encountered it
and solved it. Or for a given problem,
it might be possible to divide it into
smaller parts or to map it to a more
general problem, and to find existing
solutions for these.
Of course, there is a chance that no
prior work exists or that the available
solutions are unsatisfactory, but it really is worth the effort to evaluate existing solutions because a vocabulary can
represent hundreds or even thousands
of hours of analysis and encoding. For
example, the Universal Business Language (UBL), a set of information building blocks common to business transactions and several dozen standard
documents that reuse them, is the result of years of work by numerous XML
and business experts—and the UBL
effort itself began in 2001 by building
on the XML Common Business Library
(xCBL), on which work began in 1997.
We always tell students the worst
thing about XML is the same as the
best thing: the ease with which you can
create a new vocabulary. Language design is fundamentally hard, but XML
has made it deceptively simple by lowering the syntactic threshold. The conceptual tasks of creating shared vocabularies that are globally understood,
well defined in every necessary respect,
and reasonably easy to use have not
been made easier by XML. XML has
just given us a good toolset to describe
and work with these languages once
we have them, but defining them still
is hard work.
This, of course, is not a secret to
computer scientists, and the fact that
XML has no semantics when they are
essential to meaningful information
We always tell
students the worst
thing about XmL
is the same as the
best thing: the ease
with which you
can create a new
vocabulary.
exchange led to the idea of the Semantic Web. 2 The value proposition of the
Semantic Web is compelling: a common way of representing semantics
makes it easier to express, understand,
exchange, share, merge, and agree on
them. The Semantic Web, however, is
also the leading cause of the more advanced strains of XML fever.
advanced Strains
If semantics are important, and since
an XML schema defines only structures (that is, syntax), then semantics
must be specified in some other way.
This can happen informally by prose
describing the meaning of the individual components and parts of a schema,
or more formally, by using some model
for specifying semantics. The Semantic Web is the most popular candidate
for such an environment; it is based on
a model for making statements about
resources, the Resource Description
Framework (RDF), with various technologies layered on top of that, such as
those for describing schemas for RDF.
One important observation about
the Semantic Web that is often missed
is that it introduces not only models
for semantics (various schema languages for RDF), but also a new data
model, which means that XML’s tree
structures are no longer the core data
structures for representing data. RDF
can be expressed in XML, but there are
many different ways of doing it, which
can cause a very specific illness:
RDF rage. RDF’s most widely used
syntax is XML based, but there are
many different ways in which the same
set of RDF triples can be expressed
as XML, so working with RDF data is
almost impossible using basic XML
tools, even for simple tasks such as
comparing RDF data. This inability to
use a seemingly related toolset for a
seemingly related task often is the first
symptom through which XML users
learn that they are now suffering from
more advanced strains of XML fever.
In a more classical view of information organization, the meaning of
terms can be specified in a variety of
ways. Ordered by complexity, popular
approaches are controlled vocabularies, taxonomies, thesauri, and ontologies. RDF can be used to implement
any of these concepts, but RDF schemas are most often referred to as on-