Search code examples
ruby-on-railsrubyxmlsoapsavon

XML fails on  character


When requesting data from my remote server it responds with a value inside a node with the following token , making the parsing process to fail. I manually removed the guilty string and it started working.

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
...
    <sFName>Bradley</sFName>
    <sLName>L&#x1E;ibbra</sLName>
...

Token: &#x1E;

The error raised by Savon is:

Savon::InvalidResponseError: Unable to parse response body:


Solution

  • &#x1E; (aka INFORMATION SEPARATOR TWO) is not an allowed character in XML :

    [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
    

    Therefore your data is not XML, and any conformant XML processor must report an error such as the one you received.

    You must repair the data by removing any illegal characters by treating it as text, not XML, manually or automatically before using it with any XML libraries.

    See also How to parse invalid (bad / not well-formed) XML?