Search code examples
perlxml-twig

XML::twig to filter XML parent nodes in PERL


I have a xml snippet

<head>
 <a>
   <b  attr_1=1>
   <b  attr_1=2>
     <c  attr_2 =3  attr_3 =5/>
     <c  attr_2 =4  attr_3 =6 />
  </b>
 </a>
<a>
   <b  attr_1=1/>
   <b  attr_1=3>
     <c  attr_2 =3  attr_3 =5/>
     <c  attr_2 =10  attr_3 =10/ >
   </b>
 </a>
</head>

Now only those node are legitimate which have <b attr_1 =3>(at least one) and at least one respective child <c> having attr_2=10 and attr_3 =10 is there. Thus the ouput file should have following trade

   <a>
       <b  attr_1=1/>
       <b  attr_1=3>(this is the legitimate value)
         <c  attr_2 =3  attr_3 =5/>
         <c  attr_2 =10  attr_3 =10/ >(this is the legitimate combination)
       </b>  
   </a>

My Code is

use strict;
use warnings;
use XML::Twig;

my $twig = new XML::Twig( twig_handlers => { a=> \&a} );
$twig->parsefile('1511.xml');
$twig->set_pretty_print('indented');
$twig->print_to_file('out.xml');

    sub a {

        my ( $twig, $a ) = @_ ;

        $a->cut
         unless grep { $_->att( 'attr_1' ) eq '3' } $a->children( 'b' )

    }

By this I am able to go till level . Please help if anybody can in explaining how to traverse and grep till node C which is inside node B.


Solution

  • You had some errors in your XML-file. Also you seem to have deleted some parts of your description. You can also set some attribute restrictions to handlers and the *child methods.

    sub a {
    
      my ( $twig, $a ) = @_ ;
      my $cut = 1;
    
      foreach my $b ($a->children('b[@attr_1="3"]')){
        $cut &&= not grep {$_->att('attr_2') eq '10'
                       and $_->att('attr_3') eq '10'} $b->children('c');
      }
    
    
      $a->cut if $cut;
    }
    

    This is the file I used for testing:

    <head>
    <a>
       <b  attr_1="1" />
       <b  attr_1="2">
         <c  attr_2 ="3"  attr_3 ="5"/>
         <c  attr_2 ="4"  attr_3 ="6" />
      </b>
    </a>
    <a>
       <b  attr_1="1"/>
       <b  attr_1="3">
         <c  attr_2 ="3"  attr_3 ="5"/>
         <c  attr_2 ="10"  attr_3 ="10" />
       </b>
     </a>
    <a>
       <b  attr_1="1"/>
       <b  attr_1="3">
         <c  attr_2 ="3"  attr_3 ="5"/>
         <c  attr_2 ="10"  attr_3 ="12" />
       </b>
     </a>
    </head>
    

    The output:

    <head>
      <a>
        <b attr_1="1"/>
        <b attr_1="3">
          <c attr_2="3" attr_3="5"/>
          <c attr_2="10" attr_3="10"/>
        </b>
      </a>
    </head>
    

    Edit: If you really want to have only grep statements you could use some nested greps like this, though I'd advice you to use the above, more readable solution.

    $a->cut unless
      grep {grep {$_->att('attr_2') eq '10' and $_->att('attr_3') eq '10'}
        $_->children('c')} $a->children('b[@attr_1="3"]');