correctly interpreted only within its context in the XML document (where all in-scope namespace declarations can be accessed), or it must be decon-textualized by parsing it and replacing each QName with a context-indepen-dent representation. Unfortunately, no standard exists for this latter approach, which makes this contextualized content brittle and hard to work with.

The strains described so far manifest themselves in basic XML processing tasks. As soon as XML users begin work with business information and processes, they must confront the challenge of understanding what XML structures actually mean. This task exposes them to a dangerous virus encoded in the catchy slogan that XML is “self-describing.”

We could be charitable and assume that when people say XML is self-describing, what they really mean is “compared with something else that clearly isn’t.” The least self-describing information consists of just a stream of alphanumeric characters of some text format, as they might be on a punch card. This delimiter-less encoding does not even make explicit the tokenization of the characters into meaningful values, so there is not any “self” to which any description could be assigned. The possibility of self-description emerges only when we separate the values with commas or some other delimiter character, which tells us what information components must be described. XML goes one step further with the syntactic mechanisms of paired text labels to distinguish the information components in a stream of text and quotes to associate one bit of information as an attribute of another. It is certainly fair to say that XML is on average more self-describing than other text-based encoding syntaxes, but that is like saying the average dwarf is taller than the average baby; neither is tall enough to excel at basketball.

From a more technical perspective, it is also true that XML is self-describing in the limited sense that the data structure (one of the XML trees, see tree trauma) can be reconstructed from an XML document (and maybe its schema, if processing takes place in an environment susceptible to default derangement).

When most people say that XML is self-describing, however, they are being captured by a delusion that this refers to actual semantics, overlook-

While XmL’s ing the fact the XML has almost no predefined semantics (the only excep- success is well tion being one predefined attribute for earned as the first identifying languages). The disease is most likely caused by the many XML truly universal examples that show element and at- standard for tribute names that seem to be self- describing because they are labeling structured data, it the syntactic components. It could be prevented with examples that merely

must now deal with show how the XML markup characters
numerous problems distinguish the information being de- scribed from the markup that is part of
that have grown its structural description:
up around it. these
are not entirely the
fault of XmL itself,
but instead can
be attributed to
exaggerated claims
and ideas of what

XmL is and what it
can do.

<xxx yyy=”4567”>850</xxx> <zzz>20060812</zzz>

 

Using syntactic mechanisms to provide clues to the element and attribute semantics is convenient, but this is the cause of a very common strain of XML fever:

Self-description delusion. XML’s ability to define names for elements and attributes, and the widespread assumption that these names have some intrinsic semantics, often cause victims to assume that the semantics of an XML document are self-evident, openly available just by looking at it and understanding the names. Frequently, this strain of XML fever causes great discomfort when the victims learn that XML does not deal with semantics, and that common understanding has to be established through other mechanisms. Victims weakened by self-description delusion are often infected by one or more of the intermediate or advanced strains of XML fever, which promise to easily and permanently cure the pain caused by self-description delusion.

Recovery from self-description delusion can take a great deal of personal commitment and effort. Victims must learn how to define or adapt an XML vocabulary, or to adopt technologies that are explicitly focused on semantics, not just syntax. In either case, these steps risk exposure to strains of XML fever beyond the basic types.

 

intermediate Strains If self-description delusion is appropriately diagnosed and treated, XML users

References:

Archives