Search code examples
ajaxperlubunturhel5xml-simple

XML::Simple leaving entities in attribute text


I have two systems, one RHEL5 and one Ubuntu 10.04, and they exhibit differing behavior. I'm using perl's XML::Simple to parse the response of a call to some SaaS software. The response is:

    <xml answer="{&quot;foo&quot;: &quot;bar&quot;}" />

The ubuntu system correctly returns {"foo": "bar"}, but the RHEL5 system leaves the quoted entities in the attribute tag, and I cannot seem to find the option to change this.

Yes, the XML::Simple versions are slightly different (and I cannot change that); RHEL5: 2.14, Ubuntu: 2.18. I'd love to solve this so that the behavior is consistent.


Solution

  • Delete the XML::SAX::PurePerl section from the file returned by

    perl -MFile::Basename -E'say dirname($ARGV[0])."/SAX/ParserDetails.ini"' "`perldoc -l XML::SAX`"
    

    The module is awful!

    • It's slow. And I mean CRAZY slow.
    • It can't doesn't handle encodings correctly.
    • And apparently, it doesn't handle entities correctly either.

    If you want the best performance from XML::Simple, make sure to use

    local $XML::Simple::PREFERRED_PARSER = 'XML::Parser';
    

    Caveat: XML::Parser doesn't handle namespaces.

    Note: XML::LibXML is still 17x faster than XML::Simple with XML::Parser.