I'm trying to validate some existing XML files in my project. An example structure looks like:
<resources>
<file name="one" path="C:\test\one.txt" />
<cache path="C:\test\cache\" />
<file name="two" path="C:\test\two.txt" />
<bundle name="myFolder">
<file name="three" path="C:\test\three.txt" />
<file name="four" path="C:\test\four.txt" />
</bundle>
<file name="one" path="C:\test\one.txt" />
<bundle name="myFolder">
<file name="three" path="C:\test\three.txt" />
<file name="four" path="C:\test\four.txt" />
</bundle>
<file name="one" path="C:\test\one.txt" />
</resources>
In words, what I want is a structure with root element resources
, which has children
cache
element, in any orderfile
and bundle
elements, in any orderbundle
contains any number of file
elementsThis is my current XSD (drawing from this answer):
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:complexType name="cache">
<xs:attribute name="path" type="xs:string" />
</xs:complexType>
<xs:complexType name="file">
<xs:attribute name="name" type="xs:string" />
<xs:attribute name="path" type="xs:string" />
</xs:complexType>
<xs:complexType name="bundle">
<xs:choice maxOccurs="unbounded">
<xs:element name="file" type="file" maxOccurs="unbounded" />
</xs:choice>
<xs:attribute name="type" />
<xs:attribute name="name" />
</xs:complexType>
<xs:group name="unboundednoncache">
<xs:choice>
<xs:element name="file" type="file" />
<xs:element name="bundle" type="bundle" />
</xs:choice>
</xs:group>
<xs:element name="resources">
<xs:complexType>
<xs:sequence>
<!-- Validates if this next line is removed,
and the cache element is moved to first -->
<xs:group ref="unboundednoncache" minOccurs="0" maxOccurs="unbounded" />
<xs:element name="cache" type="cache" minOccurs="0" maxOccurs="1" />
<xs:group ref="unboundednoncache" minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
This gives me the error:
Cos-nonambig: File And File (or Elements From Their Substitution Group) Violate "Unique Particle Attribution". During Validation Against This Schema, Ambiguity Would Be Created For Those Two Particles.
I can get the first XSD to validate if I remove the first <xs:group>
in the XSD and move the cache
element to be the first child in the XML, but I want the cache element to be valid anywhere.
(Prior to this version I was using:
<xs:choice maxOccurs="unbounded">
<xs:element name="cache" type="cache" minOccurs="0" maxOccurs="1" />
<xs:element name="file" type="file" minOccurs="0" maxOccurs="unbounded" />
<xs:element name="bundle" type="bundle" minOccurs="0" maxOccurs="unbounded" />
</xs:choice>
.. but that allows multiple cache elements, which I don't want either.)
Why does my XSD violate "unique particle attribution", and how can I fix it?
In your schema document, you have written the content model of resources
as the equivalent of
((file | bundle)*, cache?, (file | bundle)*)
This is semantically correct, but an initial file
element could match either the first of the second occurrence of file
in the content model. For reasons best left decently obscure, this is not allowed by XSD.
So you need an equivalent content model which is deterministic. Not all non-deterministic content models have deterministic equivalents, but yours does:
((file | bundle)*, (cache, (file | bundle)*)?)
Or, in XSD syntax (reusing your definition of unboundednoncache
):
<xs:group name="cache-plus-noncache">
<xs:sequence>
<xs:element name="cache" type="cache"
minOccurs="1" maxOccurs="1" />
<xs:group ref="unboundednoncache"
minOccurs="0" maxOccurs="unbounded" />
</xs:sequence>
</xs:group>
<xs:element name="resources">
<xs:complexType>
<xs:sequence>
<xs:group ref="unboundednoncache"
minOccurs="0" maxOccurs="unbounded" />
<xs:group ref="cache-plus-noncache"
minOccurs="0" maxOccurs="1" />
</xs:sequence>
</xs:complexType>
</xs:element>
It might be handy to have a tool that read content models, detected violations of the determinism (aka 'unique particle attribution') rule, and either proposed a deterministic equivalent or broke the bad news that the content model has no deterministic equivalent. The theory has been worked out pretty cleanly by Anne Brüggemann-Klein. But so far I am not aware of such a tool, and my periodic resolutions to write one have thus far not borne fruit.