Sorry guys, I may be asking a stupid question but I am not so well-versed in Perl or for that matter in awk(shell programming).
My requirement is to filter XML based on some conditions.
For reference I am providing a dummy XML:
<TRADEEXT>
<TRADE origin = "AB" ref = "1" version = "1"/>
<TRADE origin = "AB" ref = "1" version = "2"/>
<TRADE origin = "ABC" ref = "1" version = "1"/>
</TRADEEXT>
Now the filter conditions are as follows :
Only those TRADES must be selected which have origin = "AB"
After applying first condition, make sure to choose only those TRADES which have highest version based on ref(group by ref)
So the resultant XML with filtered TRADES must look like
<TRADEEXT>
<TRADE origin = "AB" ref = "1" version = "2"/>
</TRADEEXT>
I managed to filter the TRADES whose origin is "AB" as mentioned in below code. But i am not able to filter the TRADES based on highest version for a given ref.
#!/usr/bin/perl
use strict;
use warnings;
use XML::Twig;
my $twig = new XML::Twig( twig_handlers => { TRADE => \&TRADE } );
$twig->parsefile('1513.xml');
$twig->set_pretty_print('indented');
$twig->print_to_file('out.xml');
sub TRADE {
my ($twig, $TRADE) = @_;
foreach my $c ($TRADE) {
$c->cut($TRADE) unless $c->att('origin') eq "AB";
}
}
Any hint will be highly appreciated.
The clearest way to do this is to take two passes through the XML data: the first to find the maximum version
for each ref
, and the second to remove anything elements that have a version less than the maximum.
This program uses the TRADE
twig handler to build a hash %max_version
of maximum versions per ref
. It doesn't affect the parsing of the data at all.
Then a for
loop scans through all TRADE
children of the root element TRADEEXT
, using delete
to remove all those that have a version other than the maximum.
use strict;
use warnings;
use XML::Twig 3.48;
my $twig = new XML::Twig(
twig_handlers => { '/TRADEEXT/TRADE' => \&trade_handler },
att_accessors => [ qw/ origin ref version / ],
pretty_print => 'indented',
);
my %max_version;
$twig->parsefile('1513.xml');
for my $trade ($twig->root->children('TRADE')) {
my ($ref, $version) = ($trade->ref, $trade->version);
$trade->delete unless $version eq $max_version{$ref};
}
$twig->print_to_file('out.xml');
sub trade_handler {
my ($twig, $trade) = @_;
if ( $trade->origin eq 'AB' ) {
my ($ref, $version) = ($trade->ref, $trade->version);
unless (exists $max_version{$ref} and $max_version{$ref} >= $version) {
$max_version{$ref} = $version;
}
}
1;
}
output
<TRADEEXT>
<TRADE origin="AB" ref="1" version="2"/>
</TRADEEXT>