Search code examples
phpxmlparsingmalformed

Reading in Malformed XML (unencoded XML entities) with PHP


I'm having some trouble parsing malformed XML in PHP. In particular I'm querying a third party webservice that returns data in an XML format without encoding the XML entities in actual data. For example one of the the elements contains an ASCII heart, '<3', without the quotes, which the XML parser sees as an opening tag. It should be '&lt;3'.

Right now I'm simply passing the XML string into a SimpleXMLElement which, predictably, fails on these instances. I've done some looking around and it seems like PHP Tidy package might be able to help me, but the amount of configuration you can do is overwhelming :(

Thus, I'm just wondering if anyone else has had a problem like this and, if so, how they were able to solve it.

Thanks!


Solution

  • Try tidy.repairString:

    php > $tidy = new tidy();
    php > $repaired = $tidy->repairString("<foo>I <3 Philadelphia</foo>", array("input-xml"=>1));
    php > print($repaired);
    <foo>I &lt;3 Philadelphia</foo>
    php > $el = new SimpleXMLElement($repaired);