Search code examples
xmlpowershellatom-feed

Parsing atom files in Powershell


I am trying to parse the Microsoft Windows 10 feed:

$feed = "https://support.microsoft.com/app/content/api/content/feeds/sap/en-us/6ae59d69-36fc-8e4d-23dd-631d98bf74a9/atom"
$resp = Invoke-WebRequest -Uri "$feed"

However converting the response to XML with [xml]($resp.Content) gives an error.
An easy fix would be removing the initial (empty?) character:

[xml]($resp.Content.Substring(1))

Which is the correct way, by the way?


Solution

  • As pointed out in the comments you can either let Invoke-RestMethod take care of the content parsing for you:

    $atoms = Invoke-RestMethod -Uri "$feed"
    

    or you could use the -replace regex operator to trim formatting characters off the beginning of the string:

    $atomDoc = $resp.Content -replace '^\p{Cf}' -as [xml]
    

    \p{Cf} matches any character that falls under the unicode Format category


    If you're looking for a more comprehensive sanitation of your input string, you can also remove any character that doesn't belong in an XML document:

    $resp.Content -replace '[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]',''