Search code examples
perlxpathxml-twig

XML::Twig : how to specify a twig_handler for a <_tag> which starts with an underscore?


I have an XML file whose root element tag is <__> (two underscores). When, however, that tag name is used in the twig_handlers list XML::Twig->new dies with the error message:

unrecognized expression in handler: '__'

Actually, ANY tag starting with an underscore produces this error except for Twig's special tags _all_ and _default_, either of which I can use to process the file at the expense of throwing away all the handler callbacks except the last.

The invocation which fails is:

XML::Twig->new (twig_handlers => { '__' => \&show })

I imagine there's an XML::Twig Xpath expression which can be used here but the CPAN documentaton is pretty vague about their syntax. I also now wonder what I'd have to do to get at an element <_all_> :)

If anyone has a suggestion it would be much appreciated.

The problem only occurs when the twig is created since once processing has started (using the callback expression _all_), <__> elements at any level in the input are processed normally.

If anyone wants to play with the problem, here's the program I was using to try finding a solution. Set $xpath to the expression you want to test.

use strict;
use XML::Twig;

my $xpath = '_all_';    # <---- fails if one puts '__' here

my $xml = <<EOS;        # <---- here's the XML data to process
<__>
   <AA>first</AA>
   <__>second</__>
</__>
EOS

sub show {
    print "handler called for element ", $_->gi, ", whose children are\n";
    my @children = $_->children;
    for my $elt (@children) {
        print "\t", $elt->gi, " holds \"", $elt->text, "\"\n";
    }
    1;
}

my $twig = XML::Twig->new (twig_handlers => { $xpath => \&show });
$twig->parse ($xml);

Solution

  • Which version of XML::Twig are you using? This is a bug that was fixed in version 3.38.

    From the Changes file:

    version 3.38
    date: 2011-02-27
    # minor maintenance release
    fixed: RT 65865: _ should be allowed at the start on an XML name
           https://rt.cpan.org/Ticket/Display.html?id=65865
           reported by Steve Prokopowich
    

    And indeed when I use '__' as the value for $xpath the code runs without errors, and gives the correct output.