Search code examples
perlxml-twig

Unable to filter XML based on given conditions


Sorry guys, I may be asking a stupid question but I am not so well-versed in Perl or for that matter in awk(shell programming).

My requirement is to filter XML based on some conditions.

For reference I am providing a dummy XML:

<TRADEEXT>
    <TRADE origin = "AB"  ref = "1" version = "1"/>
    <TRADE origin = "AB"  ref = "1" version = "2"/>    
    <TRADE origin = "ABC" ref = "1" version = "1"/>    
</TRADEEXT>

Now the filter conditions are as follows :

  1. Only those TRADES must be selected which have origin = "AB"

  2. After applying first condition, make sure to choose only those TRADES which have highest version based on ref(group by ref)

So the resultant XML with filtered TRADES must look like

<TRADEEXT>
    <TRADE origin = "AB" ref = "1" version = "2"/>    
</TRADEEXT>

I managed to filter the TRADES whose origin is "AB" as mentioned in below code. But i am not able to filter the TRADES based on highest version for a given ref.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $twig = new XML::Twig( twig_handlers => { TRADE => \&TRADE } );
$twig->parsefile('1513.xml');
$twig->set_pretty_print('indented');
$twig->print_to_file('out.xml');

sub TRADE {
    my ($twig, $TRADE) = @_;
    foreach my $c ($TRADE) {
        $c->cut($TRADE) unless $c->att('origin') eq "AB";
    }
}

Any hint will be highly appreciated.


Solution

  • The clearest way to do this is to take two passes through the XML data: the first to find the maximum version for each ref, and the second to remove anything elements that have a version less than the maximum.

    This program uses the TRADE twig handler to build a hash %max_version of maximum versions per ref. It doesn't affect the parsing of the data at all.

    Then a for loop scans through all TRADE children of the root element TRADEEXT, using delete to remove all those that have a version other than the maximum.

    use strict;
    use warnings;
    
    use XML::Twig 3.48;
    
    my $twig = new XML::Twig(
        twig_handlers => { '/TRADEEXT/TRADE' => \&trade_handler },
        att_accessors => [ qw/ origin ref version / ],
        pretty_print  => 'indented',
    );
    
    my %max_version;
    
    $twig->parsefile('1513.xml');
    
    for my $trade ($twig->root->children('TRADE')) {
      my ($ref, $version) = ($trade->ref, $trade->version);
      $trade->delete unless $version eq $max_version{$ref};
    }
    
    $twig->print_to_file('out.xml');
    
    sub trade_handler {
      my ($twig, $trade) = @_;
    
      if ( $trade->origin eq 'AB' ) {
    
        my ($ref, $version) = ($trade->ref, $trade->version);
    
        unless (exists $max_version{$ref} and $max_version{$ref} >= $version) {
          $max_version{$ref} = $version;
        }
      }
    
      1;
    }
    

    output

    <TRADEEXT>
      <TRADE origin="AB" ref="1" version="2"/>
    </TRADEEXT>