Search code examples
xmlperlxpathlibxml2

Using LibXML and XPath To Find Node With Colon (Local Namespace)


I'm trying to get the attribute @id1 from <Incoming> in the below XML:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Incomings xmlns:ns2="http://testme.org/foo/schema">
    <Incoming id1="6bbaec22" id2="928c2081">
        <ns2:Address>fubar@test.com</ns2:Address>
    </Incoming>
</Incomings>

The only information that I can pass in is the email address fubar@test.com

I'm using XML::LibXML and XML::LibbXML::XPathContext as below:

my $dom = XML::LibXML->new->parse_file( $xml_file );  # XML contains as above
my $xpc = XML::LibXML::XPathContext->new( $dom->documentElement );
$xpc->registerNs('x', 'http://testme.org/foo/schema');

my $email = 'fubar@test.com';
my $xpath = "/x:Incomings/x:Incoming/x:ns2:Address[text()='$email']/../\@id1";
my @nodes = $xpc->findnodes( $xpath );

But it always gives me an invalid expression in $xpath around the ns2:Address.

What mistake did I make above? If the node name is only <Address> then removing the ns2: from my $xpath statement giving me the correct values in @nodes.

Thanks!


Solution

  • I think there's two problems here - first off, xpath expressions find nodes. You can search based on the existence and content of an attribute, but findnodes will give you the element, not the content.

    Secondly - you can't nest namespaces in XML. x:ns2:Address isn't valid. Do you actually need to register your x namespace there? You may not need to at all. (e.g. based on your small XML snippet).

    Can I offer an alternative option? Because you're working with perl you don't actually necessarily need to do everything via the xpath expression.

    I'd be perhaps thinking findnodes followed by grep:

    NB: Using XML::Twig for illustration - pretty sure something pretty similar works in XML::LibXML.

    #!/usr/bin/env perl
    use strict;
    use warnings;
    use XML::Twig;
    
    my $twig = XML::Twig->new( 'pretty_print' => 'indented_a' )->parse( \*DATA );
    
    my @elt_list = grep { $_->trimmed_text =~ m{fubar\@test.com} }
        ( $twig->findnodes('//ns2:Address') );
    
    foreach my $elt (@elt_list) {
        print $elt -> parent -> att('id1');
    }
    
    
    __DATA__
    <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
    <Incomings xmlns:ns2="http://testme.org/foo/schema">
        <Incoming id1="6bbaec22" id2="928c2081">
            <ns2:Address>fubar@test.com</ns2:Address>
        </Incoming>     
    </Incomings>
    

    I'd also note - your xpath lets you find an element - rather than an attribute - so you can select on 'elements with an id1 attribute like this:

    my @elt_list = ( $twig->findnodes("//ns2:Address[string()='$email']/../.[\@id1]") );
    
    foreach my $elt (@elt_list) {
        print $elt -> att('id1');
    }
    

    Depends rather on how specific you want to be with your findnodes search. Based on what you've provided in that snippet, you've gone for much too complicated, and could simply do:

    use XML::Twig;
    
    my $twig = XML::Twig->parsefile('your_file.xml'); 
    print $twig -> findnodes('//Incoming',0)->att('id1'),"\n";
    

    Or:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    use XML::LibXML;
    
    my $xml = XML::LibXML->new->parse_file( 'sample2.xml' );
    foreach my $node (  $xml -> findnodes( '//Incoming' ) ) {
       print $node ->getAttribute('id1'), "\n";
    } 
    

    Or with a bit of grepping:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    use XML::LibXML;
    
    my $email = 'fubar@test.com';
    my $xml = XML::LibXML->new->parse_file( 'sample2.xml' );
    foreach my $node ( grep { $_ -> textContent =~ m{$email} } $xml -> findnodes( '//Incoming' ) ) {
       print $node ->getAttribute('id1'), "\n";
    } 
    

    If you particularly want to be using that x namespace though - this works:

    #!/usr/bin/env perl
    use strict;
    use warnings;
    use XML::LibXML;
    
    my $xml   = XML::LibXML->new->parse_file('sample2.xml');
    my $xpc   = XML::LibXML::XPathContext->new( $xml->documentElement );
    $xpc->registerNs( 'x', 'http://testme.org/foo/schema' );
    
    my $email = 'fubar@test.com';
    my ( $id1 ) = map { $_ -> getAttribute('id1') // () } $xpc->findnodes("/Incomings/Incoming/x:Address[text()='$email']/..");
    print $id1,"\n";
    

    (Also works if I mock up some XML with multiple 'Incoming' nodes to select the first with the right email address. Note // is perl 5.10 onwards, and is a conditional on 'defined'. You could probably substitute it with || on older versions, which is 'true/false' - the only places where there's differences is empty strings and zeros)