often recover with improved insight.
They now realize that XML’s basic technologies and toolset can be employed
for basic processing tasks involving
structured data, but that most applications involve models of the application
data or processes. XML is based on tree
structures as the basic model, and this
does not always provide the best fit for
application-level models, which can
cause trouble when mapping these
nontree structures to XML:
Tree tremors. Whereas tree trauma
(discussed earlier) is a basic strain of
XML fever caused by the various flavors of trees in XML technologies, tree
tremors are a more serious condition
afflicting victims trying to manage
data in XML that is not inherently tree-structured. The most common causes
are data models requiring nontree
graph structures and document models needing overlapping structures. In
both cases, mapping these models to
XML’s tree model results in XML structures that cannot conveniently represent the application-level model.
We often tell students that “the best
thing about XML is the ease with which
you can create a new vocabulary.” But
because XML allows well-formed documents (as opposed to valid documents
that must conform to some schema), it
is actually possible to use vocabularies
that have never been explicitly created:
documents can simply use elements
and attributes that were never declared
(let alone defined) anywhere. Well-formedness can be appropriate during prototyping but is reckless during
deployment and almost certainly subverts interoperability. Unfortunately,
many XML users suffer from a condition that prevents them from seeing
these dangers:
Model myopia. Starting from a prototype based on well-formed documents,
some developers never bother to develop a schema, let alone a well-defined
mapping between such a schema and
the application-level data model. In scenarios leading to this condition, validation often is only by eye (key phrases for
this technique are “looks good to me”
or “our documents usually sort of look
like these two examples here”), which
makes it impossible to test documents
strictly for correctness. Round-trip
XML-to-model and reverse transformations cannot be reliably implemented,
and assumptions and hacks get built
into systems, which inevitably cause
interoperability problems later on.
If model myopia is diagnosed (often
by discovering that two implementations do not interoperate correctly because of different sets of assumptions
built into these implementations),
the key step in curing it is to define a
schema so that the XML structures to
be used in documents are well defined
and can be validated using existing
tools. As soon as this happens, the obvious question is which schema language to use. This can be the beginning
of another troublesome development:
Schema schizophrenia. DTDs are
XML’s built-in schema language, but
they are limited in their expressiveness
and do not support essential XML features (most notably, they do not work
well with XML Namespaces). After considering various alternative languages,
the W3C eventually settled on XSDL, a
rather complex schema language with
built-in modeling capabilities. XSDL’s
expressiveness can directly cause an
associated infection, caused by the inability to decide between modeling alternatives:
Schema option paralysis. XSDL’s
complexity allows a given logical model to be encoded in a plethora of ways
(this fever will mutate into an even
more serious threat with the upcoming
XSDL 1. 1, which adds new features that
overlap with existing features). A cure
for schema option paralysis is to use
alternative schema languages with a
better separation of concerns (such as
limiting itself to grammars and leaving
data types and path-based constraints
to other languages), most notably RELAX NG.
Using more focused schema languages and targeting a separation of
concerns leaves schema developers
with a choice of schema languages. In
addition, at times it would be ideal to
combine schema languages to capture
more constraints than any one could
enforce on its own. The choice of schema languages, however, is more often
determined by available tool support
and acquired habits than by a thorough
analysis of what would be the most appropriate language.
Since schema schizophrenia (with
occasional bouts of schema option
paralysis) can be a painful and long-
lasting condition, one tempting way
out is not to use schema languages
as the normative encoding form for
models and instead generate schemas
from some more application-oriented
modeling environment or tool. Very
often, however, these tools have a different built-in bias, and they rarely support document modeling. This causes
a very specific problem for generated
schemas:
Mixed content crisis. XML’s origin as
a document representation language
gives it capabilities to represent complex document structures, most notably mixed content, essential in publications and other narrative document
types. Most non-XML modeling environments and tools, however, are data
oriented and lack support for mixed
content. These tools produce XML
structures that look like table dumps
from a relational database, lacking the
nuanced document structures that are
crucial in a document-processing environment.
Because the approach of generating schemas has the advantage that
developers of XML schemas never have
to actually write them (or even look at
them), it also can be the cause of one of
the most troubling XML problems that
is often experienced when encountering schemas generated from UML
models or spreadsheets:
Generated schema indigestion. More
abstract models have to be mapped to
XML vocabularies for XML-based information exchange. Most modeling tools
and development environments export
models to XSDL and use that schema
for serializing and parsing instances.
Because of the perniciousness of schema schizophrenia, however, this mod-el-to-schema encoding is complex and
tool dependent. Generated schema
indigestion often afflicts those who try
to use the schema or instances outside
the context of the tools that generated
them. This first contact with generated
schemas can be very frustrating and
distasteful, because unless the same
XML encoding rules are followed in
both contexts, XML might not be easy
to work with and certainly is neither interoperable nor extensible.
These intermediate strains of XML
fever mostly revolve around the problem of how to create and use well-defined descriptions of XML vocabular-