Search code examples
c#xmlxml-parsinghttpwebrequestxmldocument

Xml parsing error: hexadecimal value is an invalid character


I try get xml file with this code:

HttpWebRequest webReq = (HttpWebRequest)WebRequest.Create(url);
HttpWebResponse response = (HttpWebResponse)webReq.GetResponse();
string xml = string.Empty;
using (StreamReader sr = new StreamReader(response.GetResponseStream()))
{
    xml = sr.ReadToEnd();
}

XmlDocument xmlDoc = new XmlDocument();
//xml = xml.Replace((char)(0x1F), ' ');
xmlDoc.LoadXml(xml);

but I get exception as below:

' ', hexadecimal value 0x1F, is an invalid character. Line 1, position 1.

So according to many similar questions on stackoverflow I try add this commented line, but then I get exception:

Data at the root level is invalid. Line 1, position 2.

What's wrong?


Solution

  • Assuming the compression that is being applied to the XML is GZip you can uncompress the XML like so:

    HttpWebRequest webReq = (HttpWebRequest)WebRequest.Create(url);
    HttpWebResponse response = (HttpWebResponse)webReq.GetResponse();
    string xml = string.Empty;
    using (GZipStream gzip = new GZipStream(response.GetResponseStream(), CompressionMode.Decompress))
    using (StreamReader sr = new StreamReader(gzip))
    {
      xml = sr.ReadToEnd();
    }
    
    XmlDocument xmlDoc = new XmlDocument();
    //xml = xml.Replace((char)(0x1F), ' ');
    xmlDoc.LoadXml(xml);
    

    If the GZipStream does not work to decompress the XML you'll have to replace it with the appropriate decompression stream.