I am trying to parse the Microsoft Windows 10 feed:
$feed = "https://support.microsoft.com/app/content/api/content/feeds/sap/en-us/6ae59d69-36fc-8e4d-23dd-631d98bf74a9/atom"
$resp = Invoke-WebRequest -Uri "$feed"
However converting the response to XML with [xml]($resp.Content)
gives an error.
An easy fix would be removing the initial (empty?) character:
[xml]($resp.Content.Substring(1))
Which is the correct way, by the way?
As pointed out in the comments you can either let Invoke-RestMethod
take care of the content parsing for you:
$atoms = Invoke-RestMethod -Uri "$feed"
or you could use the -replace
regex operator to trim formatting characters off the beginning of the string:
$atomDoc = $resp.Content -replace '^\p{Cf}' -as [xml]
\p{Cf}
matches any character that falls under the unicode Format category
If you're looking for a more comprehensive sanitation of your input string, you can also remove any character that doesn't belong in an XML document:
$resp.Content -replace '[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]',''