tologies. This is in part a result of free standards-based tools for creating ontologies such as Protégé and SWOOP; just as we mentioned with schema option paralysis, the availability of tools shapes the languages people use and the choices they make. The relative unfamiliarity and the vague “hipness” of the “ontology” world, however, can give XML users anxiety about their ability to adjust to the RDF/OWL world with more rigorous semantics. As a result, they often overcompensate:
Ontology overkill. Operating in an environment that focuses on semantics, victims of ontology overkill tend to overmodel semantics, creating abstractions and associations that are of little value to the application but make the model much harder to understand and use. Ontology overkill forces its sufferers not only to overmodel, but also often to fail at doing so, because it is much harder to define an ontology (in its fullest sense), and to identify, understand, and validate all its implications, than it is to define a controlled vocabulary.
If XML fever sufferers come in contact with communities where Semantic Web ideas are widespread and well established, they quickly discover that most of their knowledge acquired in the basic and intermediate phases of the XML learning curve does not apply anymore. The reason for this is that the Semantic Web creates a completely self-contained world on top of other Web technologies, with the only intersection being the fact that resources are identified by URI. As a result, Semantic Web users become blissfully unaware that the Web may have solutions for them or that there could be a simpler way of solving problems. Seeing the Semantic Web as the logical next step of the Web’s evolution, we can observe the following condition:
Web blindness. This is a condition in which the victim settles into the Semantic Web to a degree where the non-Semantic Web does not even exist anymore. In the pure Semantic Web, lower-level technologies no longer need to evolve, because every problem can be solved on the semantic layer. Web blindness victims often are only dimly aware that many problems in the real world are and most likely will be solved with technologies other than
Semantic Web technologies.
If victims of Web blindness have adjusted to their new environment of abundant RDF and start embracing the new world, they may come in contact with applications that have aggregated large sets of RDF data. While RDF triples are a seemingly simple concept, the true power of RDF lies in the fact that these triples are combined to form interconnected graphs of statements about things, and statements about statements, which quickly makes it impossible to use this dataset without specialized tools. These tools require specialized data storage and specialized languages for accessing these stores. Handling these large sets of data is the leading cause of an RDF-specific ailment:
Triple shock. While RDF itself is simple, large datasets easily contain millions of triples (for truly large datasets this can go up to billions), and managing and querying such a big dataset can become a considerable challenge. If the schema of these large datasets is simple, but ontology overkill has set in and it has been reformulated as an ontology, handling this dataset may become considerably harder, without any immediate benefit.
Semantic Web technologies may be the correct choice for projects requiring fully developed ontologies, but Semantic Web technologies have little to do with the plain Web and XML. This means that neither should be regarded as a cure for basic or intermediate XML fevers, and that each has its own set of issues, which are only partially listed here.
We probably cannot prevent these varieties of XML fever, especially the basic strains, because it is undoubtedly a result of the hype and overbroad claims for XML that many people try it in the first place. We can do a better job of inoculating XML novices and users against the intermediate and advanced strains, however, by teaching them that the appropriate use of XML technologies depends on the nature and scope of the problems to which they are applied. Heavyweight XML specifications such as those developed by OASIS, OMG, and other standards organizations are necessary to build robust enterprise-class XML applications, and
Semantic Web concepts and tools are prerequisites for knowledge-intensive computation, but more lightweight approaches for structuring and classifying information such as microformats will do in other contexts.
When someone first learns about it, XML may seem like the hammer in the cliché about everything looking like a nail. Those of us who teach XML, write about it, or help others become effective users of it, however, can encourage a more nuanced view of XML tools and technologies that portrays them as a set of hammers of different sizes, with a variety of grips, heads, and claws. We need to point out that not everyone needs a complete set of hammers, but information architects should know how to select the appropriate hammer for the kind of hammering they need to do. And we should always remember that pounding nails is only one of the tasks involved in design and construction.
XML has succeeded beyond the wildest expectations as a convenient format for encoding information in an open and easily computable fashion. But it is just a format, and the difficult work of analysis and modeling information has not and will never go away.
References
1. Bell, A. E. Death by UML fever. ACM Queue 2, 1 (Mar.
2004), 72-80.
2. Berners-Lee, T., Hendler, J. A., Lassila, O. The Semantic Web. Scientific American 284, 5 (May 2001), 34-43.
3. Bray, T. On language creation. In Proceedings of XML 2005 (Atlanta, GA, Nov. 2005).
4. Bray, T., Paoli, J., Michael Sperberg-McQueen, C. Extensible markup language (xML) 1.0. World Wide Web Consortium, Recommendation REC- xml-19980210 (Feb. 1998).
Erik Wilde ( dret@berkeley.edu) is a visiting assistant professor in the School of Information at the University of California at Berkeley, where he is also technical director of the Information and Service Design program.
Robert J. Glushko ( glushko@ischool.berkeley.edu) is an adjunct professor at the University of California at Berkeley in the School of Information, the director of the Center for Document Engineering, and one of the founding faculty members of the Information and Service Design program.
References:
Archives