Goal
Using PowerShell 5.1, detect an invalid XML file by validating it against an XML schema using Microsoft's System.Xml.XmlReader
. I'll detect the invalid XML file by catching the XMLException
that XmlReader
throws on an XML parse error.
Note: I don't want to use PowerShell Community Extensions Test-Xml
cmdlet.
The problem
The line of code $readerResult = $xmlReader.Read()
does not throw the XMLException I expect when parsing an invalid XML file
References
Validation Using the XmlSchemaSet
My XSD
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="urn:config-file-schema">
<xs:element name="notes">
<xs:complexType>
<xs:sequence>
<xs:element name="note" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="to"/>
<xs:element name="from">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:string">
<xs:attribute type="xs:byte" name="type" use="optional"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
<xs:element type="xs:string" name="heading"/>
<xs:element type="xs:string" name="body"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
My invalid XML (second line uses bogus element name notXXXes
)
<?xml version="1.0" encoding="UTF-8"?>
<notXXXes xmlns="urn:config-file-schema">
<note>
<to>Tove</to>
<from type="1">Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note>
<note>
<to>Bob</to>
<from type="2">KeyW</from>
<heading>Reminder</heading>
<body>I won't</body>
</note>
</notes>
My code
When run, $readerResult
returns true, indicating that the next node was read successfully. I expect $xmlReader.Read()
to throw an XMLException
because the XML file content violates the schema.
cls
$error.clear()
try
{
[System.Xml.Schema.XmlSchemaSet] $schemaSet = New-Object -TypeName System.Xml.Schema.XmlSchemaSet
$schemaSet.Add("urn:config-file-schema","C:\Users\x\Desktop\test.xsd");
[System.Xml.XmlReaderSettings] $readerSettings = New-Object -TypeName System.Xml.XmlReaderSettings
$readerSettings.Schemas = $schemaSet
$readerSettings.ValidationType = [System.Xml.ValidationType]::Schema
$readerSettings.ConformanceLevel = [System.Xml.ConformanceLevel]::Fragment
$readerSettings.IgnoreWhitespace = $true;
$readerSettings.IgnoreComments = $true;
[System.Xml.XmlReader]$xmlReader = [System.Xml.XmlReader]::Create("C:\Users\x\Desktop\test.xml", $readerSettings);
#just to show that Schemas was set up OK
"target namespace: " + $readerSettings.Schemas.Schemas().TargetNamespace
$readerResult = $xmlReader.Read()
"readerResult: " + $readerResult
}
catch
{
"error: " + $error
}
finally
{
$xmlReader.Close()
}
Edit #1
This fragment will read each line of XML from file and display its metadata
while ($xmlReader.Read())
{
write-console ("Depth:{0,1} Name:{1,-10} NodeType:{2,-15} Value:{3,-30}" -f $xmlReader.Depth, $xmlReader.Name, $xmlReader.NodeType, $xmlReader.Value)
}
The whole point of the XmlReader
concept is that it's a streaming approach to dealing with XML. This allows you to access large/complex XML documents without having to hold the entire thing in memory (and, if you're using DOM-style access, several layers of additional memory usage to boot).
This is efficient in terms of memory use, but does mean that errors are only reported as nodes with issues are encountered.
The first Read
here reads the XML Declaration - <?xml version="1.0" encoding="UTF-8"?>
- which appears well formed and should not raise any errors. If you need to validate the entire document then you'll need to Read
it entirely through until the end. But if that's your only purpose, I'd probably defer to e.g. the Test-Xml
commandlet that you're dismissing.