Search code examples
xmlxsd-validationxmlreader

Why doesn't my code that uses System.Xml.XmlReader detect an invalid XML file?


Goal

Using PowerShell 5.1, detect an invalid XML file by validating it against an XML schema using Microsoft's System.Xml.XmlReader. I'll detect the invalid XML file by catching the XMLException that XmlReader throws on an XML parse error.

Note: I don't want to use PowerShell Community Extensions Test-Xml cmdlet.

The problem

The line of code $readerResult = $xmlReader.Read() does not throw the XMLException I expect when parsing an invalid XML file

References

Validation Using the XmlSchemaSet

XmlReader Class

My XSD

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="urn:config-file-schema">
  <xs:element name="notes">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="note" maxOccurs="unbounded" minOccurs="0">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:string" name="to"/>
              <xs:element name="from">
                <xs:complexType>
                  <xs:simpleContent>
                    <xs:extension base="xs:string">
                      <xs:attribute type="xs:byte" name="type" use="optional"/>
                    </xs:extension>
                  </xs:simpleContent>
                </xs:complexType>
              </xs:element>
              <xs:element type="xs:string" name="heading"/>
              <xs:element type="xs:string" name="body"/>
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

My invalid XML (second line uses bogus element name notXXXes)

<?xml version="1.0" encoding="UTF-8"?>
<notXXXes xmlns="urn:config-file-schema">
    <note>
        <to>Tove</to>
        <from type="1">Jani</from>
        <heading>Reminder</heading>
        <body>Don't forget me this weekend!</body>
    </note>
    <note>
        <to>Bob</to>
        <from type="2">KeyW</from>
        <heading>Reminder</heading>
        <body>I won't</body>
    </note>
</notes>

My code

When run, $readerResult returns true, indicating that the next node was read successfully. I expect $xmlReader.Read() to throw an XMLException because the XML file content violates the schema.

cls
$error.clear()

try
{

    [System.Xml.Schema.XmlSchemaSet] $schemaSet = New-Object -TypeName System.Xml.Schema.XmlSchemaSet
    $schemaSet.Add("urn:config-file-schema","C:\Users\x\Desktop\test.xsd");

    [System.Xml.XmlReaderSettings] $readerSettings = New-Object -TypeName System.Xml.XmlReaderSettings
    $readerSettings.Schemas = $schemaSet
    $readerSettings.ValidationType = [System.Xml.ValidationType]::Schema
    $readerSettings.ConformanceLevel = [System.Xml.ConformanceLevel]::Fragment
    $readerSettings.IgnoreWhitespace = $true;
    $readerSettings.IgnoreComments = $true;

    [System.Xml.XmlReader]$xmlReader = [System.Xml.XmlReader]::Create("C:\Users\x\Desktop\test.xml", $readerSettings);

    #just to show that Schemas was set up OK
    "target namespace: " + $readerSettings.Schemas.Schemas().TargetNamespace

    $readerResult = $xmlReader.Read()

    "readerResult: " + $readerResult
}
catch
{
    "error: " + $error
}
finally
{
    $xmlReader.Close()
}

Edit #1

This fragment will read each line of XML from file and display its metadata

while ($xmlReader.Read())
{
    write-console ("Depth:{0,1} Name:{1,-10} NodeType:{2,-15} Value:{3,-30}" -f $xmlReader.Depth, $xmlReader.Name, $xmlReader.NodeType, $xmlReader.Value)
}

Solution

  • The whole point of the XmlReader concept is that it's a streaming approach to dealing with XML. This allows you to access large/complex XML documents without having to hold the entire thing in memory (and, if you're using DOM-style access, several layers of additional memory usage to boot).

    This is efficient in terms of memory use, but does mean that errors are only reported as nodes with issues are encountered.

    The first Read here reads the XML Declaration - <?xml version="1.0" encoding="UTF-8"?> - which appears well formed and should not raise any errors. If you need to validate the entire document then you'll need to Read it entirely through until the end. But if that's your only purpose, I'd probably defer to e.g. the Test-Xml commandlet that you're dismissing.