ies. Before we continue to describe the more advanced strains of XML fever that may result from these intermediate fevers and attempts to cure them, it is important to point out that a good way of avoiding them is to reuse existing XML languages, thus avoiding the efforts and risks of inventing something new.

In an online follow-up to “On Language Creation,” 3 Tim Bray (one of the creators of XML) says, “If you’re going to be designing a new XML language, first of all, consider not doing it.” This is a very important point, because the ubiquity of XML makes it likely that for any given problem, somebody else might have already encountered it and solved it. Or for a given problem, it might be possible to divide it into smaller parts or to map it to a more general problem, and to find existing solutions for these.

Of course, there is a chance that no prior work exists or that the available solutions are unsatisfactory, but it really is worth the effort to evaluate existing solutions because a vocabulary can represent hundreds or even thousands of hours of analysis and encoding. For example, the Universal Business Language (UBL), a set of information building blocks common to business transactions and several dozen standard documents that reuse them, is the result of years of work by numerous XML and business experts—and the UBL effort itself began in 2001 by building on the XML Common Business Library (xCBL), on which work began in 1997.

We always tell students the worst thing about XML is the same as the best thing: the ease with which you can create a new vocabulary. Language design is fundamentally hard, but XML has made it deceptively simple by lowering the syntactic threshold. The conceptual tasks of creating shared vocabularies that are globally understood, well defined in every necessary respect, and reasonably easy to use have not been made easier by XML. XML has just given us a good toolset to describe and work with these languages once we have them, but defining them still is hard work.

This, of course, is not a secret to computer scientists, and the fact that XML has no semantics when they are essential to meaningful information

We always tell
students the worst
thing about XmL
is the same as the
best thing: the ease
with which you
can create a new
vocabulary.

exchange led to the idea of the Semantic Web. 2 The value proposition of the Semantic Web is compelling: a common way of representing semantics makes it easier to express, understand, exchange, share, merge, and agree on them. The Semantic Web, however, is also the leading cause of the more advanced strains of XML fever.

advanced Strains

If semantics are important, and since an XML schema defines only structures (that is, syntax), then semantics must be specified in some other way. This can happen informally by prose describing the meaning of the individual components and parts of a schema, or more formally, by using some model for specifying semantics. The Semantic Web is the most popular candidate for such an environment; it is based on a model for making statements about resources, the Resource Description Framework (RDF), with various technologies layered on top of that, such as those for describing schemas for RDF.

One important observation about the Semantic Web that is often missed is that it introduces not only models for semantics (various schema languages for RDF), but also a new data model, which means that XML’s tree structures are no longer the core data structures for representing data. RDF can be expressed in XML, but there are many different ways of doing it, which can cause a very specific illness:

RDF rage. RDF’s most widely used syntax is XML based, but there are many different ways in which the same set of RDF triples can be expressed as XML, so working with RDF data is almost impossible using basic XML tools, even for simple tasks such as comparing RDF data. This inability to use a seemingly related toolset for a seemingly related task often is the first symptom through which XML users learn that they are now suffering from more advanced strains of XML fever.

In a more classical view of information organization, the meaning of terms can be specified in a variety of ways. Ordered by complexity, popular approaches are controlled vocabularies, taxonomies, thesauri, and ontologies. RDF can be used to implement any of these concepts, but RDF schemas are most often referred to as on-

References:

Archives