In short, my doubt is: extending a mixed element type in W3C XML Schema, do we have to declare mixed="true
the extended element explicitly? Or does this derive implicitly from the fact that the extended element type is mixed?
The simplest example I was able to create is the following.
Given the following XML Schema "foo.xsd" file
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="mixedElement" mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="elem"/>
</xs:choice>
</xs:complexType>
<xs:element name="root">
<xs:complexType>
<xs:complexContent>
<xs:extension base="mixedElement"/>
</xs:complexContent>
</xs:complexType>
</xs:element>
</xs:schema>
where the root
element extend the complex type mixedElement
(explicitly mixed="true"
) but without mixed="true"
, and given the following XML "foo.xml" trivial file
<?xml version="1.0" encoding="utf-8"?>
<root
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="foo.xsd">abc<elem/>xyz</root>
where the content of the root
element is mixed, we have that according xmllint
xmllint --noout --schema foo.xsd foo.xml
"foo.xml" is valid, but according the XML Schema Validator (python library) it isn't because
File "/usr/lib/python3/dist-packages/xmlschema/validators/schemas.py", line 1678, in validate
raise error
xmlschema.validators.exceptions.XMLSchemaValidationError: failed validating <Element 'root' at 0x7f3592775260> with XsdGroup(model='sequence', occurs=[1, 1]):
Reason: character data between child elements not allowed
The "foo.xml" is valid according both validators if I add mixed="true"
to the root
element
<xs:element name="root" mixed="true"> <!-- added mixed="true" -->
So the question is: according to the W3C XML Schema recommendations, who's right?
xmllint
, that say that the "foo.xml" is valid?
Or the python's XML Schema Validator library, that say it isn't?
--- EDIT ---
As suggested by Michael Key in his answer ("See what happens if you make the extension non-trivial, by adding some element content") I've added an element elem2
in the extension of the Schema (new file "bar.xsd")
<?xml version="1.0" encoding="utf-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:complexType name="mixedElement" mixed="true">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="elem"/>
</xs:choice>
</xs:complexType>
<xs:element name="root">
<xs:complexType>
<xs:complexContent>
<xs:extension base="mixedElement">
<xs:choice minOccurs="0" maxOccurs="unbounded">
<xs:element name="elem2"/>
</xs:choice>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:element>
</xs:schema>
and added the element elem2
in XML file ("bar.xml")
<?xml version="1.0" encoding="utf-8"?>
<root
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="bar.xsd">abc<elem/>xyz<elem2>123</elem2>abc</root>
Now both validators agree.
The XML file isn't valid according xmllint
bar.xsd:12: element complexType: Schemas parser error : local complex type: The content type of both, the type and its base type, must either 'mixed' or 'element-only'.
WXS schema bar.xsd failed to compile
and also according the python library
xmlschema.validators.exceptions.XMLSchemaParseError: base has a different content type (mixed=True) and the extension group is not empty:
Reading the spec: XML Schema 1.0 part 1 section 3.4.6 subsection Schema Component Constraint: Derivation Valid (Extension)
, rule 1.4.3.2.2.1 says: Both {content type}s must be mixed or both must be element-only.
That would suggest that this schema is invalid.
However, Saxon-EE (my schema processor) accepts it, and Saxon-EE passes all the conformance tests, which suggests I have missed something.
Now, if I change the derived type to say mixed="false"
, Saxon still accepts it, and that seems to direcly contradict the rule 1.4.3.2.2.1 just cited. That suggests strongly to me that (a) Saxon is getting it wrong, and (b) there is no test for this in the W3C test suite.
I don't see anything in the spec to suggest that the derived type inherits the property from the base type.
As an aside, XSD 1.1 corrects an omission in XSD 1.0 by saying clearly that if a complexType
and its complexContent
child both have a mixed
attribute, they must be consistent.
I have raised a Saxon bug at https://saxonica.plan.io/issues/6523
Update:
I think the XSD rule that this example actually violates is Schema Component Constraint: Derivation Valid (Extension) rule 1.4.3.1. This is expressed in XSD 1.0 as The {content type} of the complex type definition itself must specify a particle and more clearly in XSD 1.1 as T.{content type}.{variety} = element-only or mixed. What rule 1.4 as a whols is saying is that when you derive a complex type by extension, then either (1.4.1) both must have simple content, or (1.4.2) both must have empty content, or (1.4.3) (1.4.3.1) the extension must not be empty, and (1.4.3.2) other conditions.
Saxon isn't enforcing 1.4.3.1, by the looks of it.
See what happens if you make the extension non-trivial, by adding some element content.
I have no idea, incidentally, why the spec allows an empty extension of an empty type, but disallows an empty extension of a non-empty type.
Further Update
OK, I'll start again.
I'm going to use XSD 1.1 citations, though I don't think the rules are materially different.
In 3.4.2.3.3 Mapping Rules for Content Type Property of Complex Content
, in the "mapping summary", rules for the {content type} property:
effective mixed
is false.explicit content
is empty.effective content
is empty.effective content type
is mixed (that is, it is effectively inherited from the base type; in fact it says that the mixed
attribute on the derived type is ignored)In summary, when you define a type with an empty extension, as in this example, you effectively get the base type unchanged. That's true even if the derived type explicitly says mixed="false"
- that appears to be ignored (and there's a comment in the Saxon code pointing this out).
The rules I cited earlier, such as Rule 1.4.3.2.2.1: Both {content type}s must be mixed or both must be element-only
only apply during schema component validation, and to interpret these rules we must first understand the rules for schema component construction. The rule that both content types must be mixed is not violated, because as we've just seen, the rules for schema component construction ensure that both content types are indeed mixed, even though the source schema document doesn't say so explicitly.