I am trying to filter records in a given XML file matching the contract ids contained in a CSV file.
The xml file looks like this:
<ROOTS02 xmlns="http://www.fja.com/RAN/RANTS02" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.fja.com/RAN/RANTS02 RANTS02.xsd">
<Record>
<Date>27.02.2023</Date>
<Year>2022</Year>
<ContractID>115000520</ContractID>
<Data>
... some more fields ...
</Data>
</Record>
<Record>
</Record>
....
</ROOTS02>
My perl code look like this:
#!/usr/bin/perl -w
use strict;
use XML::LibXML;
my $xml_parser = XML::LibXML->new();
my $xml_file="data0.xml";
my $vidfile="contractids.txt";
my $output="out0.xml";
my $roottag='ROOTS02';
my $rectag='Record';
my $filtertag='ContractID';
my $element;
my %vidtable;
readcontractids($vidfile);
print "Parsing input file $xml_file....";
my $xml_doc = $xml_parser->parse_file($xml_file);
#parsefile($input);
my $root = $xml_doc->documentElement();
my @records = $root->getElementsByTagName($rectag);
open(OUT, '>:encoding(UTF-8)', $output);
foreach my $record (@records) {
my $contract_id = $record->findvalue($filtertag);
if ( exists $vidtable{$contract_id} ) {
$record->unbindNode();
print OUT $record->toString();
}
}
close OUT;
print "Done!\n";
print "Output written to $output\n";
###########################################
sub readvertragids {
my $file=shift;
my $pidold;
my $pidnew;
open(FH, '<', $file) or die "Error: $file can't be read :$!";
while (<FH>) {
chomp $_;
if ( ! exists $vidtable{$_} ) {
$vidtable{$_}=$_;
}
}
close(FH);
}
It all works fine if remove the attributes from the ROOTS02 tag in the first line fo the XML file:
With the original first line of the XML file containing the attributes there's no result from the findvalue call for tag 'ContarctID':
perl -d ./t5.pl
Loading DB routines from perl5db.pl version 1.37
Editor support available.
Enter h or 'h h' for help, or 'man perldebug' for more help.
main::(./t5.pl:4): my $xml_parser = XML::LibXML->new();
DB<1> b 56
DB<2> r
main::(./t5.pl:56): if ( exists $vidtable{$contract_id} ) {
DB<2> Parsing input file rant0.xml....l 50-60
50 #parsefile($input);
51: my $root = $xml_doc->documentElement();
52: my @records = $root->getElementsByTagName($rectag);
53: open(OUT, '>:encoding(UTF-8)', $output);
54: foreach my $record (@records) {
55: my $contract_id = $record->findvalue($filtertag);
56==>b if ( exists $vidtable{$contract_id} ) {
57: $record->unbindNode();
58: print OUT $record->toString();
59 }
60 }
DB<3> p $rectag
Record
DB<4> p $filtertag
ContractID
DB<5> p $contract_id
DB<6> p $record
<Record>
<Date>27.02.2023</Date>
<Year>2022</Year>
<ContractID>115000520</ContractID>
...
What can I do to make it work even with the attributes in the root tag? How do the attributes have an influence on the functionality of the libxml functions?
[I shall use the {namespace}name
notation.]
You are looking for a {}ContractID
node.
But the node in the document is a {http://www.fja.com/RAN/RANTS02}ContractID
node.
That's because xmlns=""
sets the default namespace of the associated element, and all descendant elements.
use XML::LibXML qw( );
my $xpc = XML::LibXML::XPathContext->new();
$xpc->registerNs( r => "http://www.fja.com/RAN/RANTS02" );
my $doc = XML::LibXML->new->parse_file( "data0.xml" );
for my $rec_node ( $xpc->findnodes( "/r:ROOTS02/r:Record", $doc ) ) {
my $contract_id = $xpc->findvalue( "r:ContractID", $rec_node );
...
}