Snow Season in Schemaland
I rather expected the recent W3C Schema Experience Workshop would be a snow job. People who make something are not the proper people to conduct a review of it. It the basic principle of auditing, and of code reviewing. If I had evangelized my employer and its customers to invest heavily and strategically in a technology I had been pivotal of developing, it is unrealistic to expect me to impartially filter and adjudge user feedback and ultimately come up with anything other than variants of "the teething period is taking longer than expected, don't panic, no change is required."
So right from the start, when looking at this workshop, I think we need to acknowledge that we should not imagine it to be an independent review. In order for that, the ringleaders (Henry Thomson, Noah Mendelson, et al), the divide-and-conquerers, and the rest of us meek, herded sheep from the W3C XML Schema Working Group should have to have been kept out of the loop. The Workshop doesn't seem to have been a complete festival of arse-covering; but the giddy danger of unshaded posteriors loomed large.
Minutes are online at http://www.w3.org/2005/06/21-xsd-user-minutes.html and http://www.w3.org/2005/06/22-xsd-user-minutes.html. Presubmitted comments, including one by me (endorsed to some extent by Microsoft see below), are at http://www.w3.org/2005/05/25-schema/all.zip. All presentations are at http://www.w3.org/2005/06/presentations.zip. Thanks to Mike Champion for the heads-up for these, on XML-DEV.
So even though I am sceptical that the workshop could result in any adjustment of course for XML Schemas, the input reports are extremely interesting. Here are some quotes: do you see any pattern?
- ACORD:
...we have random implementations and understandings of the specification and it is usually up to the unwary user at the most inopportune moment to find out that a feature is not supported.
- BEA:
we believe that the complexity of Schema is, in many scenarios,
unnecessary, and often actively harmful. More than anything, this is the
feedback we hear from our customers and end users. We are also concerned
that Schema’s complexity may be the root cause of so many incomplete
and/or incorrect implementations. - BT:
In our experience, XML Schema is implemented inconsistently in vendor
tools, especially those which used schemas to generate mappings into code
and other forms of data.Working around interoperability issues with
vendor supplied tools is difficult and sometimes impossible when using XML
Schemas published by third-parties, standards organisations and
consortia. - HL7:
HL7's experiences with designing schemas that work across a broad
array of tools has been extremely disheartening. - IBM:
Anecdotal feedback from the Web services community has suggested
that inconsistent support for certain valid XML Schema constructs across
the various development tool environments has contributed significantly to
the interoperability challenges faced by Web services developers. - Microsoft:
Rick Jelliffe [RJ05] describes the situation faced by an
actual customer where incomplete support for XSD in various products has
“stuffed up the ready interoperability they thought they were buying into
with XML.” That sums up the problem nicely and mirrors the experience of
many of our users. - OAGi:
Complex type derivation by restriction simply does not work.
- Rogue Wave:
Many customer issues come from schemas that are not valid. In
almost all cases this is the result of a schema generated by a tool. - SAP:
In a heterogeneous environment, usability, implementation and
language binding issues impact the choice of features that are supported.
This led SAP to have different levels of support for XML Schema features
due to the impedance mismatch between the language constructs and XML
Schema constructs or the frequency of use of certain XML Schema
constructs.SAP favors this direction (a profiling mechanism) as a
catalog of profiles and their constraints could then be published and used
as basis for interoperability when designing a certain class of language
bindings and applications as well as business vocabularies. Further, it
would help reduce interoperability issues for hard to implement and
understand features of XML Schema. - Sun:
When people run these broken schemas against our schema compiler, we
reject it as an error, which only make them think that ours is broken.
This trouble-shooting can get quite complicated if it involves in a type
hierarchy, substitution groups, and/or wildcards. - WS-I:
...few web services implementers use validating XML Schema
processors. Many users "validate" SOAP messages using only inherent
SOAP-processing mechanisms, possible with some uncoordinated help from
type serializers. This situation often means that XML Schema constructs
like type facets and PSVI are ignored when web services messages are being
processed, which in turn discourages the use of such constructs by Schema
Authors.
So tool-makers blame users for generating non-standard schemas, and users blame the spec for being to difficult to know whether their schemas are standard or not, and spec makers blame tool makers for not implementing the spec properly. Who will free us from this cycle of sin and death?
The W3C response is a databinding best practices blog and group. Better than nothing?
Syntactic ambiguity versus semantic ambiguity
Reading the material, there is almost constant agreement that the UPA is a frequent stumbling block. (The Unique Particle Attribution rule is XML Schema's mechanism to prevent schema ambiguity.) I was disappointed to see that the usual spurious comments still being made about RELAX NG, which doesn't require UPA: first that it couldn't be used for data binding and second that annotations would be lost without a UPA equivalent. The approach RELAX-NG takes is to allow ambiguity in the base, and then move UPA checking to be an application-dependent constraint. (The idea of layering XML Schemas is anathema to some people, however.) And for unambiguous annotations, for RELAX NG that there are two branches in the grammar is a syntax issue not a semantics issue--the annotations on syntactically alternate but semantically identical nodes will be identical: a schema validator (a nice clean extra layer) can check this.
What to do?
My own take on all this? Well, I am surprised that the user experience feedback is so frank and so bad. Like I said, I expected a snow job.
XML Schemas needs to be refactored so that it
- uses current syntax and namespace where possible
- allows at its base something more like the RELAX NG static-typing structural model, with its semantic-not-syntactic UPA model
- uses XQuery datatype model: keep XSD datatypes as much as possible
- factors out runtime or dynamic typing (xsi:type), runtime schema discovery model, key/uniqueness, derivation-checking (!), and syntactic UPA checking to other related optional parts
- embedded Schematron for co-constraints
The only way that XML Schemas can be refactored is
with a different core XML Schemas working group.
My current expectation is that a lot of nothing
will happen until XQuery/XSLT2 becomes seen as a more central technology than XML Schemas; the goal will then be how to support XQuery most minimally.
In other words, I don't see the PSVI and the (XQuery) XML Schemas datatypes/facets going away. But all the rest needs to be refactored and layered: wildcards, type derivation, uniqueness and references, lax/strict/NVDL validation, runtime checking and so on. The car needs a new engine.
Which versus How
But I think there is an architectural difficulty that also needs to be addressed. A SOAP envelope and the SOAP payload belong to different layers, and it is incorrect, in general, to think of a schema for the document or even of one effective schema made from all the schemas for the various namespaces. Each layer in a multi-layer protocol is concerned with the semantics of its control data but only the minimum syntax of its payload. When using XML, control data envelopes the payload, confusing the layers. The result is that people either validate too much or not at all. Schematron's phase mechanism and ISO DSDL's Part 4 Namespace-based validation dispatching language (NVDL) get it more right.
What DSDL Part 4 gets right is that selecting which information to validate should be a separate issue from how the information is validated. The Schema Experience Workshop has some people wanting better wildcards, and other people wanting fewer: the superficial conclusion is that these are irreconcilable differences and so they cannot both be satisfied. But many uses cases could be satisfied by taking wildcards out from the schema language, and adding a projection layer, such as NVDL.
I think there is the glimmer of awareness of this: Henry Thompson has recycled James Clark's double validation method (used by James to validate XHTML) for a certain XML Schema issue, and David Orchard and others have been working on schema projection, which is like NVDL but using DBMS concepts.
But I bet you won't be seeing Ads for XSD tools or systems with "extremely disheartening", "difficult and sometimes impossible", and "trouble-shooting can get quite complicated" :-) or will we get nothing but sunshine, snow and anal modesty.
Categories
WebRead More Entries by Rick Jelliffe.

Erratum
I checked it again, and Henry Thompson's method is different to James'. James uses two schemas for XHTML: one for most things, the other for preventing recursive elements like <a>. Henry's method is to validate once, discard elements that have some PSVI result (e.g. that are not valid), and then revalidating with the original schema again. But it is still a method of selecting elements to validate, like NVRL, so it doesn't change my point. Sorry for the wrong comparison there!