One of the projects I have been working on recently has been a proof-of-concept system to allow a rules-base approach to automatically classifying and annotating XML-in-ZIP documents. The approach we have taken is to use Schematron, using the report elements rather than the assert elements.
I was surprised to read a review of Schematron and other schema languages which cited the lack of localization as an important reason to not use it, so the next release of the skeleton has localized messages. Here is the approach I took to localize the XSLT.
Thoughts on Schematron headers for processing ODF and OOXML, with a C# URL Resolver that handles ZIP files like some Java resolvers. The new XML-in-ZIP documents present a new challenge: constraints that formerly would have been kept in a single document are now split into multiple documents. When the basic information is kept in a single XML file, validation is reasonably straightforward.The current range of schema tools support these kinds of intra-document invariants quite well. But no document is an island, so Schematron also supports a range of intra-document constraints, but it may be time to enhance it to support the XML-in-ZIP issues better.