I want to enforce a <a-special/>
element to occur at least once in my document. For such a grammar, a document like this would be valid (since <a-special/>
occurs):
<my-container>
text <a id="1" type="B"/> text text <a-special/>
text text <a id="5" type="B"/> text <a id="24" type="B"/>
text <a id="5" type="C"/>
</my-container>
whereas this would be considered as invalid (since <a-special/>
does not occur):
<my-container>
<a id="1" type="B"/> text text
text <a id="5" type="B"/> text <a id="24" type="B"/>
text <a id="5" type="C"/>
</my-container>
I have tried different things with the grammar below but I can't seem to make it work the way I need it.
<!ELEMENT my-container ( #PCDATA | a | a-special | b )*>
<!ELEMENT a-special EMPTY>
<!ELEMENT a EMPTY>
<!ATTLIST a id CDATA #REQUIRED>
<!ATTLIST a type CDATA #REQUIRED>
<!ELEMENT b EMPTY>
<!ATTLIST b id CDATA #REQUIRED>
<!ATTLIST a type CDATA #REQUIRED>
I know this is wrong but I was thinking about something like this:
<!ELEMENT my-container
a-special+ ( #PCDATA | a | b | a-special )*
| ( #PCDATA | a | b )+ a-special+ ( #PCDATA | a | b | a-special )*
>
The first part would parse anything that starts with a-special
and the second parse would parse anything that expects either an a-special
somewhere in between or at the end. Can this be done with a DTD grammar?
The constraint you want to enforce cannot be stated with an XML DTD.
If your outermost element really is just a sequence of character data and empty children, the content-model-like expression you mention would (after supplying the missing commas) capture the constraint accurately:
((#PCDATA | a | b)*, a-special, (#PCDATA | a | b | a-special)*)
This would be legal in SGML (or so I think, but I haven't checked). But the only allowable forms for mixed content in XML DTDs are
(#PCDATA)
(#PCDATA | x | y | ... |z)*
(#PCDATA)*
The constraint described would be expressible in XSD or in Relax NG.
If any elements other than the document element are allowed to be non-empty, then the constraint is not expressible with content models in any schema language I know of: content models function as a sort of context-free grammar, and the requirement that there be an a-special
element somewhere in the document entails a form of context-sensitivity.
As @potame observed in a comment, Schematron could formulate the constraint; so could an assertion in XSD 1.1, attached to the declaration of the document element.
One possible workaround: mark the specialness of the element in a different way, e.g. by pointing at some a
elements in the document:
<!ELEMENT my-container (#PCDATA|a|b)* >
<!ATTLIST my-container specials IDREFS #REQUIRED >
<!ELEMENT a EMPTY >
<!ATTLIST a id ID #IMPLIED>
<!ELEMENT b EMPTY>
Since my-container/@specials
is required, it must name at least one element in the document. Since the only element type for which IDs are defined is a
, the elements named by specials
are guaranteed to be a
elements.