often recover with improved insight. They now realize that XML’s basic technologies and toolset can be employed for basic processing tasks involving structured data, but that most applications involve models of the application data or processes. XML is based on tree structures as the basic model, and this does not always provide the best fit for application-level models, which can cause trouble when mapping these nontree structures to XML:
Tree tremors. Whereas tree trauma (discussed earlier) is a basic strain of XML fever caused by the various flavors of trees in XML technologies, tree tremors are a more serious condition afflicting victims trying to manage data in XML that is not inherently tree-structured. The most common causes are data models requiring nontree graph structures and document models needing overlapping structures. In both cases, mapping these models to XML’s tree model results in XML structures that cannot conveniently represent the application-level model.
We often tell students that “the best thing about XML is the ease with which you can create a new vocabulary.” But because XML allows well-formed documents (as opposed to valid documents that must conform to some schema), it is actually possible to use vocabularies that have never been explicitly created: documents can simply use elements and attributes that were never declared (let alone defined) anywhere. Well-formedness can be appropriate during prototyping but is reckless during deployment and almost certainly subverts interoperability. Unfortunately, many XML users suffer from a condition that prevents them from seeing these dangers:
Model myopia. Starting from a prototype based on well-formed documents, some developers never bother to develop a schema, let alone a well-defined mapping between such a schema and the application-level data model. In scenarios leading to this condition, validation often is only by eye (key phrases for this technique are “looks good to me” or “our documents usually sort of look like these two examples here”), which makes it impossible to test documents strictly for correctness. Round-trip XML-to-model and reverse transformations cannot be reliably implemented,
and assumptions and hacks get built into systems, which inevitably cause interoperability problems later on.
If model myopia is diagnosed (often by discovering that two implementations do not interoperate correctly because of different sets of assumptions built into these implementations), the key step in curing it is to define a schema so that the XML structures to be used in documents are well defined and can be validated using existing tools. As soon as this happens, the obvious question is which schema language to use. This can be the beginning of another troublesome development:
Schema schizophrenia. DTDs are XML’s built-in schema language, but they are limited in their expressiveness and do not support essential XML features (most notably, they do not work well with XML Namespaces). After considering various alternative languages, the W3C eventually settled on XSDL, a rather complex schema language with built-in modeling capabilities. XSDL’s expressiveness can directly cause an associated infection, caused by the inability to decide between modeling alternatives:
Schema option paralysis. XSDL’s complexity allows a given logical model to be encoded in a plethora of ways (this fever will mutate into an even more serious threat with the upcoming XSDL 1. 1, which adds new features that overlap with existing features). A cure for schema option paralysis is to use alternative schema languages with a better separation of concerns (such as limiting itself to grammars and leaving data types and path-based constraints to other languages), most notably RELAX NG.
Using more focused schema languages and targeting a separation of concerns leaves schema developers with a choice of schema languages. In addition, at times it would be ideal to combine schema languages to capture more constraints than any one could enforce on its own. The choice of schema languages, however, is more often determined by available tool support and acquired habits than by a thorough analysis of what would be the most appropriate language.
Since schema schizophrenia (with occasional bouts of schema option paralysis) can be a painful and long-
lasting condition, one tempting way out is not to use schema languages as the normative encoding form for models and instead generate schemas from some more application-oriented modeling environment or tool. Very often, however, these tools have a different built-in bias, and they rarely support document modeling. This causes a very specific problem for generated schemas:
Mixed content crisis. XML’s origin as a document representation language gives it capabilities to represent complex document structures, most notably mixed content, essential in publications and other narrative document types. Most non-XML modeling environments and tools, however, are data oriented and lack support for mixed content. These tools produce XML structures that look like table dumps from a relational database, lacking the nuanced document structures that are crucial in a document-processing environment.
Because the approach of generating schemas has the advantage that developers of XML schemas never have to actually write them (or even look at them), it also can be the cause of one of the most troubling XML problems that is often experienced when encountering schemas generated from UML models or spreadsheets:
Generated schema indigestion. More abstract models have to be mapped to XML vocabularies for XML-based information exchange. Most modeling tools and development environments export models to XSDL and use that schema for serializing and parsing instances. Because of the perniciousness of schema schizophrenia, however, this mod-el-to-schema encoding is complex and tool dependent. Generated schema indigestion often afflicts those who try to use the schema or instances outside the context of the tools that generated them. This first contact with generated schemas can be very frustrating and distasteful, because unless the same XML encoding rules are followed in both contexts, XML might not be easy to work with and certainly is neither interoperable nor extensible.
These intermediate strains of XML fever mostly revolve around the problem of how to create and use well-defined descriptions of XML vocabular-
References:
Archives