Search code examples
xmlperlattrxml-libxml

How to list XML node attributes with XML::LibXML?


Given the following XML snippet:

<outline>
  <node1 attribute1="value1" attribute2="value2">
    text1
  </node1>
</outline>

How do I get this output?

outline
node1=text1
node1 attribute1=value1
node1 attribute2=value2

I have looked into use XML::LibXML::Reader;, but that module appears to only provide access to attribute values referenced by their names. And how do I get the list of attribute names in the first place?


Solution

  • You find the list of attributes by doing $e->findnodes( "./@*");

    Below is a solution, with plain XML::LibXML, not XML::LibXML::Reader, that works with your test data. It may be sensitive to extra whitespace and mixed-content though, so test it on real data before using it.

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    use XML::LibXML;
    
    my $dom= XML::LibXML->load_xml( IO => \*DATA);
    my $e= $dom->findnodes( "//*");
    
    foreach my $e (@$e)
      { print $e->nodeName;
    
        # text needs to be trimmed or line returns show up in the output
        my $text= $e->textContent;
        $text=~s{^\s*}{};
        $text=~s{\s*$}{};
    
        if( ! $e->getChildrenByTagName( '*') && $text)
          { print "=$text"; }
        print "\n"; 
    
        my @attrs= $e->findnodes( "./@*");
        # or, as suggested by Borodin below, $e->attributes
    
        foreach my $attr (@attrs)
          { print $e->nodeName, " ", $attr->nodeName. "=", $attr->value, "\n"; }
      }
    __END__
    <outline>
      <node1 attribute1="value1" attribute2="value2">
        text1
      </node1>
    </outline>