Search code examples
perlxpathxml-twig

Select the 1st element only - with condition using XML::Twig


Having this code:

#!/usr/bin/env perl
use 5.014;
use warnings;
use XML::Twig;

my $twig = XML::Twig->parse( \*DATA );
$twig->set_pretty_print('indented_a');

# 1st search
# this prints OK the all <files> nodes where the <type> == 'release'
$_->print for ( $twig->findnodes( '//type[string()="release"]/..' ) );

# 2nd search    
# try to get first matched only
my $latest = $twig->findnodes( '(//type[string()="release"])[1]/..' );
$latest->print;

__DATA__
<root>
    <files>
        <type>beta</type>
        <ver>3.0</ver>
    </files>
    <files>
        <type>alpha</type>
        <ver>3.0</ver>
    </files>
    <files>
        <type>release</type>
        <ver>2.0</ver>
    </files>
    <files>
        <type>release</type>
        <ver>1.0</ver>
    </files>
</root>

The above prints

  <files>
    <type>release</type>
    <ver>2.0</ver>
  </files>
  <files>
    <type>release</type>
    <ver>1.0</ver>
  </files>
error in xpath expression (//type[string()="release"])[1]/.. around (//type[string()="release"])[1]/.. at /opt/anyenv/envs/plenv/versions/5.24.0/lib/perl5/site_perl/5.24.0/XML/Twig.pm line 3648.

The wanted output from the 2nd search

    <files>
        <type>release</type>
        <ver>2.0</ver>
    </files>

e.g. the first <files> node where the <type> eq 'release'.

According to this answer the used XPath expression (//type[string()="release"])[1]/..' should work, but seems I again missed something important.

Could anyone help, please?


Solution

  • XML::Twig doesn't support the full XPath syntax. The documentation for the get_xpath method (the same as findnodes) says this

    A subset of the XPATH abbreviated syntax is covered:

    tag
    tag[1] (or any other positive number)
    tag[last()]
    tag[@att] (the attribute exists for the element)
    tag[@att="val"]
    tag[@att=~ /regexp/]
    tag[att1="val1" and att2="val2"]
    tag[att1="val1" or att2="val2"]
    tag[string()="toto"] (returns tag elements which text (as per the text method) 
                         is toto)
    tag[string()=~/regexp/] (returns tag elements which text (as per the text 
                            method) matches regexp)
    expressions can start with / (search starts at the document root)
    expressions can start with . (search starts at the current element)
    // can be used to get all descendants instead of just direct children
    * matches any tag
    

    So subexpressions within parentheses aren't supported, and you may specify only a single predicate

    It's also important that, in scalar context, findnodes will only ever return a count of the number of nodes found. You must use it in list context to retrieve the nodes themselves, which means that a simpler way to find just the first matching element is to write

    my ($latest) = $twig->findnodes( '//type[string()="release"]/..' );
    

    which works fine

    If you really need the full power of XPath, then you can use XML::Twig::XPath instead. This module uses either XML::XPath or the excellent XML::XPathEngine to provide the full XPath syntax by overloading findnodes. (The other methods get_xpath and find_nodes continue to use the reduced XML::Twig variation.)

    findnodes in scalar context now returns an XML::XPathEngine::NodeSet object which has array indexing overloaded. So you can write

    my $latest = $twig->findnodes( '//type[string()="release"]/..' );
    $latest->[0]->print;
    

    or just

    my ($latest) = $twig->findnodes( '//type[string()="release"]/..' );
    

    as above.

    Finally, I would prefer to see /root/files[type[string()="release"]] in preference to the trailing parent::node(), but that is purely personal