Search code examples
xmlperlxml-twig

XML::Twig - set_text without clobbering structure


With XML::Twig using the set_text method - there is a warning:

set_text ($string) Set the text for the element: if the element is a PCDATA, just set its text, otherwise cut all the children of the element and create a single PCDATA child for it, which holds the text.

So if I want to do something simple, like - say - changing the case of all the text in my XML::Document:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig->new(
    'pretty_print'  => 'indented_a',
    'twig_handlers' => {
        '_all_' => sub {
            my $newtext = $_->text_only;
            $newtext =~ tr/[a-z]/[A-Z]/;
            $_->set_text($newtext);
        }
    }
);
$twig->parse( \*DATA );
$twig->print;

__DATA__
<root>
    <some_content>fish
        <a_subnode>morefish</a_subnode>
    </some_content>
    <some_more_content>cabbage</some_more_content>
</root>

This - because of set_text replacing children - gets clobbered into:

<root></root>

But if I focus on just one (bottom level) node (e.g. a_subnode) then it works fine.

Is there an elegant way to replace/transform text within an element without clobbering the data structure below it? I mean, I can do test on the presence of children or something similar, but ... there seems like there should be a better way of doing this. (A different library maybe?)

(And for the sake of clarity - this is my example of transliterating all the text in a document, my actual use case is rather more convoluted, but is still 'about' in place text tranformation).

I'm considering perhaps a node cut/and/paste approach (cut all children, replace text, paste all children) but that seems to be an inefficient approach.


Solution

  • Instead of having the handler on _all_, try having it only on text elements: #TEXT, and change text_only to text. It should work.

    update: Or use the char_handler option when you create the twig: char_handler => sub { uc shift }, instead of the handler.