I am working with an XML document that has disjoined nested namespaces. Using a Powershell script, I need to loop through the XML nodes and select a value from each of the inner namespaces.
The problem I am having is that the value returned is always from the first set of data within the inner namespace.
Here is a representation of the XML file named nsTest.xml
:
<ns1:Root xmlns:ns1="example.com\ns1" xmlns:ns2="example.com\ns2" XsdSchemaValidatable="true">
<ns1:DataSet>
<ns1:TimeStamp>
<ns2:Time>2023-06-01T08:00:00</ns2:Time>
</ns1:TimeStamp>
</ns1:DataSet>
<ns1:DataSet>
<ns1:TimeStamp>
<ns2:Time>2024-07-02T08:00:00</ns2:Time>
</ns1:TimeStamp>
</ns1:DataSet>
</ns1:Root>
Here is the PowerShell script I am using:
Set-Location $PSScriptRoot
[Xml]$XMLData = Get-Content "nsTest.xml"
$nsmgr = New-Object System.Xml.XmlNamespaceManager($XMLData.NameTable)
$nsmgr.AddNamespace("ns1", "example.com\ns1")
$nsmgr.AddNamespace("ns2", "example.com\ns2")
$DataSets = $XMLData.SelectNodes("//ns1:Root/ns1:DataSet", $nsmgr)
Write-Host ("The number of items in the dataset is: " + $DataSets.Count)
Write-Host ("DataSets [0] is: " + $DataSets[0].SelectSingleNode("//ns1:TimeStamp/ns2:Time", $nsmgr).InnerText)
Write-Host ("DataSets [1] is: " + $DataSets[1].SelectSingleNode("//ns1:TimeStamp/ns2:Time", $nsmgr).InnerText)
Write-Host $DataSets[0].InnerXml
Write-Host $DataSets[1].InnerXml
Here are the results I am getting:
The number of items in the dataset is: 2
DataSets [0] is: 2023-06-01T08:00:00
DataSets [1] is: 2023-06-01T08:00:00
<ns1:TimeStamp xmlns:ns1="example.com\ns1"><ns2:Time xmlns:ns2="example.com\ns2">2023-06-01T08:00:00</ns2:Time></ns1:TimeStamp>
<ns1:TimeStamp xmlns:ns1="example.com\ns1"><ns2:Time xmlns:ns2="example.com\ns2">2024-07-02T08:00:00</ns2:Time></ns1:TimeStamp>
Here is what I would have expected to see:
The number of items in the dataset is: 2
DataSets [0] is: 2023-06-01T08:00:00
DataSets [1] is: 2024-07-02T08:00:00
<ns1:TimeStamp xmlns:ns1="example.com\ns1"><ns2:Time xmlns:ns2="example.com\ns2">2023-06-01T08:00:00</ns2:Time></ns1:TimeStamp>
<ns1:TimeStamp xmlns:ns1="example.com\ns1"><ns2:Time xmlns:ns2="example.com\ns2">2024-07-02T08:00:00</ns2:Time></ns1:TimeStamp>
I have tried using the [local-name() = 'Time'] convention instead of specifying the namespace, but that didn't make a difference.
Is there something about XML namespaces that I am not understanding?
Pragmatically speaking, you can use PowerShell's adaption of the XML DOM,[1] which is namespace-agnostic, in combination with member-access enumeration:
# -> 2
$xmlData.Root.DataSet.Count
# -> @('2023-06-01T08:00:00', '2024-07-02T08:00:00')
$xmlData.Root.DataSet.TimeStamp.Time
# -> @(
# '<ns2:Time xmlns:ns2="example.com\ns2">2023-06-01T08:00:00</ns2:Time>',
# '<ns2:Time xmlns:ns2="example.com\ns2">2024-07-02T08:00:00</ns2:Time>'
# )
$xmlData.Root.DataSet.TimeStamp.InnerXml
As for what you tried:
By starting your XPath query with //
, you're starting the search at the root of the entire document rather than from the node on which you call .SelectSingleNode()
, so the same node - under the first dataset - is found in both calls.
Simply use a relative path to avoid this problem, i.e. omit //
:
# ...
# Note that "//" has been removed from the paths.
Write-Host ("DataSets [0] is: " + $DataSets[0].SelectSingleNode("ns1:TimeStamp/ns2:Time", $nsmgr).InnerText)
Write-Host ("DataSets [1] is: " + $DataSets[1].SelectSingleNode("ns1:TimeStamp/ns2:Time", $nsmgr).InnerText)
# ...
[1] In essence, PowerShell allows you to treat any parsed [xml]
document as an object graph that you can drill into using regular dot notation, because PowerShell surfaces XML (child) elements and XML attributes as namespace-less properties on each object (XML node) in the graph.
See the third section of this answer for details.