Search code examples
perlxml-twig

XML::TWIG to filter the XML in PERL


I am somehow stuck and banging my head. I have to delete unwanted TRADES from a huge XML file.

<TRADEEXT>
  <TRADE origin = 1,version =1>
     <EVENT externtype ='PROC'/>
     <EVENT externtype ='PROCC'/>
  </TRADE>
  <TRADE origin = 1,version =1>
     <EVENT externtype ='PROCC'/>
  </TRADE>
</TRADEEXT>

Now, the second TRADE is having externtype = 'PROCC' inside node which is not legitimate(legitimate value is 'PROC')

Hence the final output should be

<TRADEEXT>
   <TRADE origin = 1,version =1>
      <EVENT externtype ='PROC'/>
      <EVENT externtype ='PROCC'/>
   </TRADE>
<TRADEEXT>

which should get pasted to new file. Most important point to be noted here is even though one EVENT is having illegal value, since the other EVENT is having legitimate value, TRADE becomes legitimate. Hence, at least one EVENT should be legitimate and that will make entire trade legitimate My code is

use strict;
use warnings;
use XML::Twig;

my $twig = new XML::Twig( twig_handlers => { TRADE => \&TRADE } );
$twig->parsefile('1513.xml');
$twig->set_pretty_print('indented');
$twig->print_to_file('out.xml');

sub TRADE {
    my ( $twig, $TRADE ) = @_;
    foreach  my $c ($TRADE->children('EVENT')) 
    {
     $c->cut($TRADE) unless
     $c->att('eventtype') eq "PROC"

      ;
    }
}

Unfortunately, it's deleting EVENT tag instead of TRADE tag.

Any hint will be appreciated.


Solution

  • I don't know XML::Twig. In XML::LibXML, you'd do

    for my $bad_trade ('/TRADEEXT/TRADE[ EVENT/@externtype = "PROCC" ]') {
        $bad_trade->parentNode->removeChild($bad_trade);
    }