correctly interpreted only within its
context in the XML document (where
all in-scope namespace declarations
can be accessed), or it must be decon-textualized by parsing it and replacing
each QName with a context-indepen-dent representation. Unfortunately, no
standard exists for this latter approach,
which makes this contextualized content brittle and hard to work with.
The strains described so far manifest themselves in basic XML processing tasks. As soon as XML users begin
work with business information and
processes, they must confront the
challenge of understanding what XML
structures actually mean. This task
exposes them to a dangerous virus encoded in the catchy slogan that XML is
“self-describing.”
We could be charitable and assume
that when people say XML is self-describing, what they really mean is
“compared with something else that
clearly isn’t.” The least self-describing
information consists of just a stream
of alphanumeric characters of some
text format, as they might be on a
punch card. This delimiter-less encoding does not even make explicit
the tokenization of the characters into
meaningful values, so there is not any
“self” to which any description could
be assigned. The possibility of self-description emerges only when we
separate the values with commas or
some other delimiter character, which
tells us what information components
must be described. XML goes one step
further with the syntactic mechanisms of paired text labels to distinguish the information components
in a stream of text and quotes to associate one bit of information as an
attribute of another. It is certainly
fair to say that XML is on average
more self-describing than other text-based encoding syntaxes, but that is
like saying the average dwarf is taller
than the average baby; neither is tall
enough to excel at basketball.
From a more technical perspective, it is also true that XML is self-describing in the limited sense that
the data structure (one of the XML
trees, see tree trauma) can be reconstructed from an XML document (and
maybe its schema, if processing takes
place in an environment susceptible
to default derangement).
When most people say that XML
is self-describing, however, they are
being captured by a delusion that this
refers to actual semantics, overlook-
While XmL’s ing the fact the XML has almost no predefined semantics (the only excep-
success is well tion being one predefined attribute for
earned as the first identifying languages). The disease is most likely caused by the many XML
truly universal examples that show element and at-
standard for tribute names that seem to be self- describing because they are labeling
structured data, it the syntactic components. It could be
prevented with examples that merely
must now deal with show how the XML markup characters
numerous problems distinguish the information being de- scribed from the markup that is part of
that have grown its structural description:
up around it. these
are not entirely the
fault of XmL itself,
but instead can
be attributed to
exaggerated claims
and ideas of what
XmL is and what it
can do.
<xxx yyy=”4567”>850</xxx>
<zzz>20060812</zzz>
Using syntactic mechanisms to provide
clues to the element and attribute semantics is convenient, but this is the cause of a
very common strain of XML fever:
Self-description delusion. XML’s
ability to define names for elements
and attributes, and the widespread
assumption that these names have
some intrinsic semantics, often cause
victims to assume that the semantics
of an XML document are self-evident,
openly available just by looking at it
and understanding the names. Frequently, this strain of XML fever causes
great discomfort when the victims
learn that XML does not deal with semantics, and that common understanding has to be established through
other mechanisms. Victims weakened
by self-description delusion are often
infected by one or more of the intermediate or advanced strains of XML
fever, which promise to easily and permanently cure the pain caused by self-description delusion.
Recovery from self-description delusion can take a great deal of personal
commitment and effort. Victims must
learn how to define or adapt an XML
vocabulary, or to adopt technologies
that are explicitly focused on semantics, not just syntax. In either case,
these steps risk exposure to strains of
XML fever beyond the basic types.
intermediate Strains
If self-description delusion is appropriately diagnosed and treated, XML users