I'm working with C# and .NET 8.0.6.
I have an XSD that declares a "root" element with xs:token
content. When I validate a document with an instance element that has padded content, the padding is preserved? I'm expecting the padding (leading and trailing whitespace) to be removed.
The XSD is:
<?xml version='1.0'?>
<xs:schema
targetNamespace = "http://example.org/scratch"
xmlns = "http://example.org/scratch"
xmlns:xs = "http://www.w3.org/2001/XMLSchema">
<xs:element name="root" type="xs:token"/>
</xs:schema>
The instance is:
<?xml version='1.0'?>
<pre:root xmlns:pre="http://example.org/scratch"> abc </pre:root>
My code that does the following:
XmlSchema
.SchemaSet
which is then compiled.XmlReader
(XsdValidatingReader
) using: var xmlReaderSettings = new XmlReaderSettings
{
CheckCharacters = true,
DtdProcessing = DtdProcessing.Prohibit,
Schemas = xmlSchemaSet,
ValidationType = ValidationType.Schema,
};
XmlDocument.Load(xmlReader)
.The document object yields:
<?xml version="1.0"?>
<pre:root xmlns:pre="http://example.org/scratch"> abc </pre:root>
In particular, the XmlDocument.DocumentElement has:
The padding isn't removed because XML Schema validation doesn't modify an instance document's content.
I wrongly assumed that any whitespace normalization applied to an xs:token
is persisted. However, no such modifications are prescribed in the XSD specification. Any normalization is internal to the validation process.
Although instance content isn't modified during validation, it can be augmented. Adding default values is one example. Adding information to the post schema validation infoset (PSVI) is another.
Per .NET and C#, an XML node's PSVI information can be accessed via XmlNode.SchemaInfo
and used to perform a post-validation transformation. For example, given a SchemaInfo.SchemaType.TypeCode
of XmlTypeCode.Token
, an application could normalize the content's whitespace.