Search code examples
xmlperlxml-simple

Perl XML::Simple Inconsistent behavior


I am reading XML file using XML::Simple

However, I am facing a rather "odd" situation wherein XML::Simple is behaving inconsistently across hosts

I can best guess that the shell has some role to play - but I can't be sure as I didn't find any such issue documented against XML::Simple

Any pointer will be a great aid in debugging this issue

use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
sub readXml() {

    print "XML::Simple version : $XML::Simple::VERSION\n";

    my ($phRec) = eval {XMLin("sample.xml", ForceArray => 1, KeyAttr => [] )};
    if ( $@ ) {
        print (join '', $@);
        return 0;
    }
    print Dumper($phRec);
    return 1;
}

readXml();

sample.xml

<?xml version="1.0" encoding="utf-8"?>
<node>
    <people name="whatever">etc</people>
    <people name="abc <whatever> pqr">etc</people>
</node>

I understand this is not a valid XML - but I would rather that XML::Simple should fail in both the hosts.

Host1 [Development host]

bin: perl -v

This is perl 5, version 14, subversion 1 (v5.14.1) built for x86_64-linux ...

bin: echo $SHELL

/bin/bash

bin: ./template

XML::Simple version : 2.18
$VAR1 = {
          'people' => [
                      {
                        'content' => 'etc',
                        'name' => 'whatever'
                      },
                      {
                        'content' => 'etc',
                        'name' => 'abc <whatever> pqr'
                      }
                    ]
        };

Host2 [ VM ]

bash-4.1# perl -v

This is perl, v5.10.1 (*) built for x86_64-linux-thread-multi...

bash-4.1# echo $SHELL

/bin/csh

bash-4.1# ./template

XML::Simple version : 2.18
sample.xml:4: parser error : Unescaped '<' not allowed in attributes values
    <people name="abc <whatever> pqr">etc</people>
                      ^
sample.xml:4: parser error : attributes construct error
    <people name="abc <whatever> pqr">etc</people>
...

Solution

  • The XML parser used by XML::Simple on Host1 is apparently more lenient than the one Host2.


    XML::Simple doesn't actually parse XML. It delegates that task to XML::Parser or XML::SAX. Even then, the latter itself delegates the parsing to one of many other modules.

    Not all of those parsers are of the same quality.

    Refer to "Environment" section of XML::Simple's documentation for more info. That section documents a means to select the parser XML::Simple uses. However, you should this chance to stop using XML::Simple! It's so complicated to use its own documentation discourages people from using it!