Search code examples
perllibxml2

Perl Hash using LibXML


I have an XML data as follows.

<type>
   <data1>something1</data1>
   <data2>something2</data2>
</type>
<type>
   <data1>something1</data1>
   <data2>something2</data2>
</type>
<type>
   <data1>something1</data1>
</type>

As one can see, child node data2 is sometimes not present.

I have used this as a guideline to create the following code.

my %hash;
my $parser = XML::LibXML->new();
my $doc    = $parser->parse_file($file_name);
my @nodes  = $doc->findnodes("/type");

foreach my $node(@nodes)
{
    my $key = $node->getChildrenByTagName('data1');
    my $value = $node->getChildrenByTagName('data2');
    $hash{$key} = $value;
}

Later, I am using this hash to generate some more data based on a fact if the child node data2 is present or not.

I use ne operator assuming that data in the %hash are key value pairs of strings and when data2 is not present, Perl inserts space as a value in the hash (I have printed this hash and found that only space is printed as a value).

However, I end up with following compilation errors.

Operation "ne": no method found,
        left argument in overloaded package XML::LibXML::NodeList,
        right argument has no overloaded magic at filename.pl line 74.

How do I solve this? What is the best data structure to store this data when we see that sometimes a node will not be there ?


Solution

  • First thing to realize is $value is an XML::LibXML::NodeList object. It only looks like a string when you print it because it has stringification overloaded. You can check with ref $value.

    With my $value = $node->getChildrenByTagName('data2');, $value will always be a NodeList object. It might be an empty NodeList, but you'll always get a NodeList object.


    Your version of XML::LibXML is out of date. Your version of XML::LibXML::NodeList has no string comparison overloading and, by default, Perl will not "fallback" to use stringification for other string operators like ne. I reported this bug back in 2010 and it was fixed in 2011 in version 1.77.

    Upgrade XML::LibXML and the problem will go away.

    As a work around you can force stringification by quoting the NodeList object.

    if( "$nodelist" ne "foo" ) { ... }
    

    But really, update that module. There's been a lot of work done on it.

    Perl inserts space as a value in the hash

    This is a NodeList object stringifying. I get an empty string from an empty NodeList. You might be getting a space as an old bug.

    You can also check $value->size to see if the NodeList is empty.