Search code examples
xmlpowershellxpathcdata

How to read CDATA in XML file with PowerShell using a variable for the XML path?


I am having a difficult time reading an XML file with CDATA inside if I use a variable for the path to the element in XML. (NOTE: This is based on How to read CDATA in XML file with PowerShell? )

in $xmlsource file

<list>
  <topic>
    <SubTopic>
        <topicTitle>Test</topicTitle>
        <HtmlHead><![CDATA[<br>randomHTMLhere</br>]]></HtmlHead>
    </SubTopic>
    <SubTopic2>
        <topicTitle>Test2</topicTitle>
        <HtmlHead><![CDATA[<br>randomHTMLhere2</br>]]></HtmlHead>
    </SubTopic2>
  </topic>
</list>

In PowerShell

[String]$xmlsource = "C:\PowerShell_scripts\xmlsource.xml"
[xml]$XmlContent = get-content $xmlsource    

#These methods work but the Paths are HARD-CODED
Write-host "`r`nUsing HARD-CODED Paths"
$XmlContent.list.topic.SubTopic.HtmlHead.'#cdata-section'
$XmlContent.list.topic.SubTopic.HtmlHead.InnerText
$XmlContent.list.topic.SubTopic2.HtmlHead.InnerText

#But if the path is given in a variable, I get nothing.
Write-host "`r`nUsing `$pathToElement (returns blank line)"
[String]$pathToElement = 'list.topic.SubTopic.HtmlHead'
$XmlContent.$pathToElement.InnerText        #This return a blank line


#Insult to injury
#This kinda works but to parse the path to fit in the 'GetElementsByTagName' method would be clunky, inflexible and would still return the CDATA from *both* 'HtmlHead' elements.
Write-host "`r`nwith GetElementsByTagName(`$var)"
[String]$ElementName= 'HtmlHead'
$XmlContent.GetElementsByTagName($ElementName).'#cdata-section'
Write-host "`r`nwith GetElementsByTagName()"
$XmlContent.GetElementsByTagName('HtmlHead').'#cdata-section'

Does $pathToElement need to be cast as a special datatype?

NOTE: Xpath is a query language for XML So I corrected question above.


Solution

  • $XmlContent.list.topic.SubTopic.HtmlHead 
    

    is looking up a property called list, then from that return value it looks up 'topic', then from that return value ... etc.

    $XmlContent.$XpathToElement
    

    is trying to lookup one single property named list.topic.SubTopic.HtmlHead and not finding it.

    I don't think 'list.topic.SubTopic.HtmlHead' is the right form for an XPath expression. You could do:

    $node = Select-Xml -xml $XmlContent -XPath '/list/topic/SubTopic/HtmlHead' | select -expand node
    $node.InnerText
    

    Edit: and do

    Select-Xml -xml $xml -XPath '/list/topic//HtmlHead'
    

    to get both HtmlHeads for SubTopic and SubTopic2.


    Auto-generated PS help links from my codeblock (if available):

    • Select-Xml (in module Microsoft.PowerShell.Utility)
    • select is an alias for Select-Object (in module Microsoft.PowerShell.Utility)