I was experimenting with xsi:type
and noticed that when the xsi:type
attribute is present in the root element then it appears that the root element's name does not play any role in the XSD validation.
SSCCE follows.
A.xsd
is:
<xs:schema targetNamespace="foo://a"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns="foo://a">
<xs:element name="type" type="Type"/>
<xs:simpleType name="Type">
<xs:restriction base="xs:token">
<xs:enumeration value="Archive"/>
<xs:enumeration value="Organisation"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
Given the above schema, the following XML document (a.xml
) is clearly valid against it:
<a:type xmlns:a="foo://a"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="foo://a A.xsd"
xsi:type="a:Type">
Organisation
</a:type>
What's puzzling is that it Xerces reports that the following instance document (a-v2.xml
) is also valid:
<absurdRootElementName xmlns:a="foo://a"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="foo://a A.xsd"
xsi:type="a:Type">
Organisation
</absurdRootElementName>
To demonstrate that, I downloaded Xerces2 for Java from this link, exploded the tarball and placed the following three jar files in a certain location:
I then wrote this validation script:
$ cat validate
#!/bin/bash
XERCES_HOME=~/your-chose-location
java -classpath $XERCES_HOME/xercesImpl.jar:$XERCES_HOME/xml-apis.jar:$XERCES_HOME/xercesSamples.jar sax.Counter $*
Invoking the validate
script with:
validate -v -n -np -s -f a.xml
... demonstrates that Xerces validates both a.xml
and a-v2.xml
as correct.
In contrast xmllint
, when invoked with:
xmllint -schema A.xsd a.xml
... validates the first version but complains about the second:
Schemas validity error : Element 'absurdRootElementName': No matching global declaration available for the validation root.
My questions are:
xsi:type
mean I can set the root element's name to any value I like without impacting XSD validation outcome?Xerces
or xmllint
?Unlike some other validation languages, XSD does not define a single, simple Boolean-valued concept of "validity against a given schema". So the answer to your questions is "it depends."
XSD defines several ways in which an XSD validator can be requested to assess the schema-validity of a document; it does not forbid others:
type-driven validation: we can specify a type definition in the schema and a node in the document and ask "is this node valid against this type?"
In that case, an xsi:type attribute on the element node we specified will not override the type we specified. (I almost said "it will have no effect", but it can: if xsi:type is specified, its value must name a type actually present in the schema; if there is no top-level type with the indicated QName, the xsi:type attribute is invalid and its parent will be invalid.)
element-driven validation: we can specify an element declaration in the schema and an element node in the document and ask "is this element valid against this declaration?"
In that case, the element instance will be validated against the element declaration we have specified. An xsi:type attribute on the element node we specified will (in the absence of errors) override the type specified. If the type specified in the instance cannot be found in the schema, or if it's found but is not validly derived from the declared type, or if some other rules are violated, then there will be validity issues.
attribute-driven validation doesn't apply in the case you're talking about; it involves specifying an attribute declaration in the schema and an attribute node in the instance and asking "is this attribute instance valid against this attribute declaration?".
lax-wildcard validation: we can specify a node in the document instance and ask "is this node valid against its declaration in the schema, if any?"
In lax-wildcard validation, the expectation is that at the application level a valid node counts as success, an invalid node counts as failure, and a node whose validity is unknown (because there is no such declaration) counts as success. In this case, an xsi:type attribute on the validation root will be taken as identifying a governing type definition.
strict-wildcard validation: this is essentially like lax-wildcard validation, except that if the validity of the validation root is unknown (because there is no governing element declaration or type definition), then it counts at the application level as failure.
Many command-line tools default to lax-wildcard validation of the outermost element in the XML input: they look in the schema for a top-level declaration for it, validate against that declaration if they find one, and are silent if they don't find one. (This leads to the bizarre consequence that if no declaration is found owing to a namespace error or for whatever reason, it may look to the user as if the document were valid, instead of merely being not know to be invalid.)
In the cases you describe, Xerces appears to be defaulting to lax (or strict) wildcard mode, finding the xsi:type declaration, and correctly declaring the document valid. The xmllint processor appears to be defaulting to a different mode, slightly different from any of those described by the spec, in which a top-level element declaration is sought and an error message is issued if it's not found. That's very similar to strict-wildcard validation mode as defined in the XSD spec, but it appears to exclude the possibility of validating against a type specified in the instance by an xsi:type attribute. Under those circumstances, xmllint quite correctly reports that the root element of the document is not valid against any top-level element declaration in the schema.
Now we can extend the "It depends" answer more informatively.
does setting the xsi:type mean I can set the root element's name to any value I like without impacting XSD validation outcome?
It depends on what kind of validation you're performing.
Cases where the answer is "yes" include these:
The name of the root element never has any effect on validation episodes that start from some other node, so in all cases where you request validation of an attribute or of an element other than the root element, you can indeed set its value to whatever you like.
In cases where you request type-driven validation, or cases where you request lax or strict wildcard validation and the schema has no top-level element declaration matching the root element, the name of the root element will have no effect on schema-validity assessment.
Cases where the answer is "no":
In cases where you request element-driven validation, the name of the root element must match the name given on the element declaration.
In cases where (a) you request wildcard validation and (b) the schema has a matching element declaration, then the name of the root element will have an effect on schema-validity assessment, because it will determine which element declaration is selected as the governing element declaration, which will in turn determine whether the instance-specified type is or is not validly derived from the declared type of the element.
which tool is right, Xerces or xmllint?
It depends on how you define rightness.
The XSD spec defines several ways of requesting validation (and in 1.1 assigns the names given above for them), but it very clearly does not attempt to define user or application interfaces as part of the standard, so there is no 'right' or 'wrong' with respect to the exact formulation of the validation question a schema-validator assumes when you call it its default validation routine.