Search code examples
xmlxsdlanguage-lawyerxsd-validationxmllint

XML Schema with extension of a mixed element


In short, my doubt is: extending a mixed element type in W3C XML Schema, do we have to declare mixed="true the extended element explicitly? Or does this derive implicitly from the fact that the extended element type is mixed?

The simplest example I was able to create is the following.

Given the following XML Schema "foo.xsd" file

<?xml version="1.0" encoding="utf-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:complexType name="mixedElement" mixed="true">
    <xs:choice minOccurs="0" maxOccurs="unbounded">
      <xs:element name="elem"/>
    </xs:choice>
  </xs:complexType>

  <xs:element name="root">
    <xs:complexType>
      <xs:complexContent>
        <xs:extension base="mixedElement"/>
      </xs:complexContent>
    </xs:complexType>
  </xs:element>

</xs:schema>

where the root element extend the complex type mixedElement (explicitly mixed="true") but without mixed="true", and given the following XML "foo.xml" trivial file

<?xml version="1.0" encoding="utf-8"?>

<root
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="foo.xsd">abc<elem/>xyz</root>

where the content of the root element is mixed, we have that according xmllint

xmllint --noout --schema foo.xsd foo.xml

"foo.xml" is valid, but according the XML Schema Validator (python library) it isn't because

  File "/usr/lib/python3/dist-packages/xmlschema/validators/schemas.py", line 1678, in validate
    raise error
xmlschema.validators.exceptions.XMLSchemaValidationError: failed validating <Element 'root' at 0x7f3592775260> with XsdGroup(model='sequence', occurs=[1, 1]):

Reason: character data between child elements not allowed

The "foo.xml" is valid according both validators if I add mixed="true" to the root element

<xs:element name="root" mixed="true"> <!-- added mixed="true" -->

So the question is: according to the W3C XML Schema recommendations, who's right?

xmllint, that say that the "foo.xml" is valid?

Or the python's XML Schema Validator library, that say it isn't?

--- EDIT ---

As suggested by Michael Key in his answer ("See what happens if you make the extension non-trivial, by adding some element content") I've added an element elem2 in the extension of the Schema (new file "bar.xsd")

<?xml version="1.0" encoding="utf-8"?>

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:complexType name="mixedElement" mixed="true">
    <xs:choice minOccurs="0" maxOccurs="unbounded">
      <xs:element name="elem"/>
    </xs:choice>
  </xs:complexType>

  <xs:element name="root">
    <xs:complexType>
      <xs:complexContent>
        <xs:extension base="mixedElement">
          <xs:choice minOccurs="0" maxOccurs="unbounded">
            <xs:element name="elem2"/>
          </xs:choice>
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
  </xs:element>

</xs:schema>

and added the element elem2 in XML file ("bar.xml")

<?xml version="1.0" encoding="utf-8"?>

<root
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:noNamespaceSchemaLocation="bar.xsd">abc<elem/>xyz<elem2>123</elem2>abc</root>

Now both validators agree.

The XML file isn't valid according xmllint

bar.xsd:12: element complexType: Schemas parser error : local complex type: The content type of both, the type and its base type, must either 'mixed' or 'element-only'.
WXS schema bar.xsd failed to compile

and also according the python library

xmlschema.validators.exceptions.XMLSchemaParseError: base has a different content type (mixed=True) and the extension group is not empty:

Solution

  • Reading the spec: XML Schema 1.0 part 1 section 3.4.6 subsection Schema Component Constraint: Derivation Valid (Extension), rule 1.4.3.2.2.1 says: Both {content type}s must be mixed or both must be element-only.

    That would suggest that this schema is invalid.

    However, Saxon-EE (my schema processor) accepts it, and Saxon-EE passes all the conformance tests, which suggests I have missed something.

    Now, if I change the derived type to say mixed="false", Saxon still accepts it, and that seems to direcly contradict the rule 1.4.3.2.2.1 just cited. That suggests strongly to me that (a) Saxon is getting it wrong, and (b) there is no test for this in the W3C test suite.

    I don't see anything in the spec to suggest that the derived type inherits the property from the base type.

    As an aside, XSD 1.1 corrects an omission in XSD 1.0 by saying clearly that if a complexType and its complexContent child both have a mixed attribute, they must be consistent.

    I have raised a Saxon bug at https://saxonica.plan.io/issues/6523

    Update:

    I think the XSD rule that this example actually violates is Schema Component Constraint: Derivation Valid (Extension) rule 1.4.3.1. This is expressed in XSD 1.0 as The {content type} of the complex type definition itself must specify a particle and more clearly in XSD 1.1 as T.{content type}.{variety} = element-only or mixed. What rule 1.4 as a whols is saying is that when you derive a complex type by extension, then either (1.4.1) both must have simple content, or (1.4.2) both must have empty content, or (1.4.3) (1.4.3.1) the extension must not be empty, and (1.4.3.2) other conditions.

    Saxon isn't enforcing 1.4.3.1, by the looks of it.

    See what happens if you make the extension non-trivial, by adding some element content.

    I have no idea, incidentally, why the spec allows an empty extension of an empty type, but disallows an empty extension of a non-empty type.

    Further Update

    OK, I'll start again.

    I'm going to use XSD 1.1 citations, though I don't think the rules are materially different.

    In 3.4.2.3.3 Mapping Rules for Content Type Property of Complex Content, in the "mapping summary", rules for the {content type} property:

    • Rule 1 tells us that the effective mixed is false.
    • Rule 2 tells us that the explicit content is empty.
    • Rule 3 tells us that the effective content is empty.
    • Rule 4.2.2 tells us that the effective content type is mixed (that is, it is effectively inherited from the base type; in fact it says that the mixed attribute on the derived type is ignored)
    • The remaining rules don't change anything.

    In summary, when you define a type with an empty extension, as in this example, you effectively get the base type unchanged. That's true even if the derived type explicitly says mixed="false" - that appears to be ignored (and there's a comment in the Saxon code pointing this out).

    The rules I cited earlier, such as Rule 1.4.3.2.2.1: Both {content type}s must be mixed or both must be element-only only apply during schema component validation, and to interpret these rules we must first understand the rules for schema component construction. The rule that both content types must be mixed is not violated, because as we've just seen, the rules for schema component construction ensure that both content types are indeed mixed, even though the source schema document doesn't say so explicitly.