Digital Media Web Blogs > Web

XML Conferences #1: XML Open, Cambridge, UK


This O'Reilly sponsored conference looks like becoming the European equivalent of the Extreme XML conferences. Many practical speakers, such as Michael Kay and Uche Ogbuji.

They flew me out from Australia as a keynote speaker, which was nice. (Except, of course, I caught a cold and was bedridden for a week after. Murata Makoto spoke, and he caught a cold coming. Long distance air travel is dangerous: is it because of being cooped up, or tired, or just contact with stangers?).

I presented for the first time my Document Complexity Metric. This is a magic number derivable from DTDs or document sets, which provides an objective (though challengeable) measure of how complex a document is, and therefore how much work is likely to be involved in writing a stylesheet for it. I have tested hundreds of real technical documents, most of them between 60K and 1 meg, using all sorts of proprietary DTDs to see whether the metric provides a better indication of complexity than just a raw count of elements. (It does, comparing the metrics with project manager's recollections of the jobs.)

This metric is also useful for testing how successfuil a DTD-trimming tool is (no surprises, Topologi is releasing such a tool.) During the questions, Sean McGrath mentioned his company's alternative strategy for handling the problem of using standard DTDs (i.e., you cannot want to program for every element, standard DTDs often have hundreds of elements, real document sets rarely have more than 50 or so.): he applies an 80/20 rule and they just handle the most commonly occuring 80% . (I suppose this leaves the significant elements in the remaining rare elements to be dealt with in a second iteration. Pre-emptive YAGNI.)

Michael Kay questioned whether the metric was reliable or sufficent. I think, of course, there are many other aspects that come into play, but at a minimum an objective number provides more information than just guess work. In particular, it lets you see whether the documents have the same kind of complexity as other you have done: preventing nasty contract disputes when jobs are less or more complex than expected is an important issue.

Henry Thompson gave a funny kind of paper on his experimental URL scheme which attempts to provide proper names for semantic resources. The idea is to provide some keywords and do a Google search on them; if a Google search on some other keywords returns a significant number of the same pages, then you take it that both searches refer to approximately the same thing. So you cobble together ten or so search results into a great long URI, and, uncle:Bob, you get a kind of proper name. My first response is that this is definitely useful for something, but I am not sure whether it is a proper name or not. For example, wouldn't "Guy Fawkes" and "Gun Powder Plot" be considered the same thing? I gather the idea is the Semantic Web will be a fuzzy web of federated and disparate RDF databases each with different ontologies, not a single grand unified knowledge base of facts.

The standout paper was Jenni Tennison's paper on her Datatyping language. We are considering this at ISO for inclusion into ISO Document Schema Description Languages, depending on proof of concept and interest. It solves all the kinds of problems that publishers face that XML Schemas Datatypes sweeps under the carpet: how to have dates in a localized format, how to have multilingual booleans, how to convert between inches and centimeters. In other words, it starts off with the assumption that a document (or database) contains strings expressing their information in the most natural way to the human users, not in a transnational format that requires intervening user interfaces.

It seems that Jenni's idea can be treated as a layer underneath XML Schemas datatypes: it might provide a way to have the power of facets, but not the limitations of a fixed set of primitive types and fixed lexical spaces.

Categories





AddThis Social Bookmark Button

Read More Entries by Rick Jelliffe.

Recommended for You

Topics of Interest

Archives


 
 


Or, visit our complete archive.