Search code examples
c#xmlxml-entities

XmlReader read document with unescaped &s


I am trying to parse an XMl document that i received into a string from a web service call.

String content = ...;//long xml document
using(TextReader reader = new StringReader(content))
using(XmlReader xml_reader = XmlReader.Create(reader, settings))
{
    XML = new XPathDocument(xml_reader);
}

however i get an exception :

An error occurred while parsing EntityName. Line 1, position 1721.

i looked through the document around that character and it was in the middle of a random tag, however about 20-30 chars earlier i noticed that there were unescaped ampersands (& characters), so im thinking that that is the problem.

running:

content.Substring(1700, 100);//results in the following text
"alue>1 time per day& with^honey~&water\\\\</Value></Frequency></Direction>          </Directions>     "
                    ^unescaped & char 1721 is the 'w'

how can i successful read this document as xml?


Solution

  • verify that your xml encoding matches theirs (the top of the document, something like <?xml version="1.0" encoding="ISO-8859-9"?>). Substitute the value from the webservice xml document for webserviceEncoding below

    using(XmlReader r = XmlReader.Create(new StreamReader(fileName, Encoding.GetEncoding(webserviceEncoding)))) {
        XML = new XPathDocument( r );
        // ... 
    }
    

    If that doesn't work

    1. Replace it in the string prior to loading it into an xml parser
    2. Notify the webservice vendor