Search code examples
powershellrsscdata

powershell parsing of cdata-section


I'm trying to read an rss feed using powershell and I can't extract a cdata-section within the feed

Here's a snippet of the feed (with a few items cut to save space):

<item rdf:about="http://philadelphia.craigslist.org/ctd/blahblah.html">
<title>
<![CDATA[2006 BMW 650I,BLACK/BLACK/SPORT/AUTO ]]>
</title>
...
<dc:title>
<![CDATA[2006 BMW 650I,BLACK/BLACK/SPORT/AUTO ]]>
</dc:title>
<dc:type>text</dc:type>
<dcterms:issued>2011-11-28T22:15:55-05:00</dcterms:issued>
</item>

And the Powershell script:

$rssFeed = [xml](New-Object System.Net.WebClient).DownloadString('http://philadelphia.craigslist.org/sss/index.rss')
foreach ($item in $rssFeed.rdf.item) { $item.title } 

Which produces this:

#cdata-section
--------------
2006 BMW 650I,BLACK/BLACK/SPORT/AUTO 
2006 BMW 650I,BLACK/BLACK/SPORT/AUTO 

How do I extract the cdata-section?

I tried a few variants such as $item.title."#cdata-section" and $item.title.InnerText which return nothing. I tried $item.title | gm and I see the #cdata-section listed as a property. What am I missing?

Thanks.


Solution

  • Since you have multiple of those, the title property itself would be an array, so the following should work:

    $rss.item.title | select -expand "#cdata-section"
    

    or

    $rss.item.title[0]."#cdata-section"
    

    based on what you need.