Search code examples
xmlperlxpathxml-libxml

iterating through elements using libxml in perl


I have an XML file like below,

<?xml version="1.0"?>
<data>
  <header>
    <name>V9 Red Indices</name>
    <version>9</version>
    <date>2017-03-16</date>
  </header>
  <index>
    <indexfamily>ITRAXX-Asian</indexfamily>
    <indexsubfamily>iTraxx Rest of Asia</indexsubfamily>                
    <paymentfrequency>3M</paymentfrequency>
    <recoveryrate>0.35</recoveryrate>
    <constituents>
      <constituent>
        <refentity>
          <originalconstituent>
            <referenceentity>ICICI Bank Limited</referenceentity>
            <redentitycode>Y1BDCC</redentitycode>
            <role>Issuer</role>
            <redpaircode>Y1BDCCAA9</redpaircode>
            <jurisdiction>India</jurisdiction>
            <tier>SNRFOR</tier>
            <pairiscurrent>false</pairiscurrent>
            <pairvalidfrom>2002-03-30</pairvalidfrom>
            <pairvalidto>2008-10-22</pairvalidto>
            <ticker>ICICIB</ticker>
            <ispreferred>false</ispreferred>
            <docclause>CR</docclause>
            <recorddate>2014-02-25</recorddate>
            <weight>0.0769</weight>
          </originalconstituent>
        </refentity>
        <refobligation>
          <type>Bond</type>
          <isconvert>false</isconvert>
          <isperp>false</isperp>
          <coupontype>Fixed</coupontype>
          <ccy>USD</ccy>
          <maturity>2008-10-22</maturity>
          <coupon>0.0475</coupon>
          <isin>XS0178885876</isin>
          <cusip>Y38575AQ2</cusip>
          <event>Matured</event>
          <obligationname>ICICIB 4.75 22Oct08</obligationname>
          <prospectusinfo>
            <issuers>                                                        
              <origissuersasperprosp>ICICI Bank Limited</origissuersasperprosp>
            </issuers>
          </prospectusinfo>
        </refobligation>
      </constituent>
    </constituents>
  </index>
</data>

I would like to iterate through this file without knowing the tag names. My end goal is to create a hash with tag names and values.

I do not want to use findnodes with XPath for each node. That defeats the whole purpose of writing a generic loader.

I am also using XML-LibXML-2.0126 , a little older version.

Part of my code which uses findnodes is below. The XML was also shortened to avoid a lengthy query which it has become now :)

use XML::LibXML;

my $xmldoc = $parser->parse_file( $fileName );
my $root = $xmldoc->getDocumentElement() || die( "Could not get Document Element \n" );

foreach my $index ( $root->findnodes( "index" ) ) {    # $root->getChildNodes()) # Get all the Indexes

    foreach my $constituent ( $index->findnodes( 'constituents/constituent' ) ) { # Lets pick up all Constituents

        my $referenceentity = $constituent->findnodes( 'refentity/originalconstituent/referenceentity' );    # This is a crude way. we should be iterating without knowing whats inside

        print "referenceentity :" . $referenceentity . "\n";
        print "+++++++++++++++++++++++++++++++++++ \n";
    }
}

Solution

  • Use the nonBlankChildNodes, nodeName and textContent methods provided by XML::LibXML::Node:

    my %hash;
    
    for my $node ( $oc->nonBlankChildNodes ) {
    
        my $tag = $node->nodeName;
        my $value = $node->textContent;
        $hash{$tag} = $value;
    }
    

    Which is equivalent to:

    my %hash = map { $_->nodeName, $_->textContent } $oc->nonBlankChildNodes;