Search code examples
xmlperlperl-modulexml-twig

XML::Twig change tag or create element based on node existence


I have a XML (example) file: test.xml

<root>
   <tag1>AAA</tag1>
   <tag2>BBB</tag2>
   <tag3>
      <tag4>DDD</tag4>
   </tag3>
</root>

The result I want to achieve is, set two variables (from input): i.e.:

my $xpath = '/root/tag3/tag4';   # or '/root/tag2/tag5' or '/root/tag6'
my $xvalue = 'CCC';              # or 'EEE'

The script would check the $xpath variable, if it exists in the XML file, then it changes the text of it. If it doesn't exist in the XML file, then it creates the element with $xpath and $xvalue.

I use below script to set the text for $xpath, but how to modify it so that it would do proper things based on the $xpath existence? Thanks a lot,

open( my $output, '>', "$ofile") or die "cannot create $ofile: $!";
XML::Twig->new( twig_roots => { "$xpath" =>
                               sub { my $text= $_->text();
                                     $_->set_text($xvalue);
                                     $_->flush;
                                   },
                             },
            twig_print_outside_roots => $output,
            pretty_print => 'indented',
          )
          ->parsefile( "test.xml" );

Solution

  • It's a fairly simple task using a recursive subroutine

    In the program below, each call to add_xpath advances the value of $node and removes one step from the XPath expression in the $path parameter

    • If the path begins with a slash and a tag name then the tag name is checked to make sure it matches the name of the root element. Then the current node is set to the root element and the subroutine recurses

    • If the path starts immediately with a tag name, then has_child is called to see if a child of that name already exists. If not then insert_new_elt adds one for us. The current node is set to the new or pre-existing child node and the subroutine recurses

    • Otherwise the path should be empty, and it is checked to make sure. Then set_text is called to set the text contents of the currenty node and the recursion terminates

    The output show the resulting XML structure after each of the three operations that you show in your question

    use strict;
    use warnings;
    
    use XML::Twig;
    use Carp;
    
    my $twig = XML::Twig->new;
    $twig->parsefile('test.xml');
    $twig->set_pretty_print('indented');
    print $twig->sprint, "\n";
    
    add_xpath($twig->root, '/root/tag3/tag4', 'CCC');
    print $twig->sprint, "\n";
    
    add_xpath($twig->root, '/root/tag2/tag5', 'EEE');
    print $twig->sprint, "\n";
    
    add_xpath($twig->root, '/root/tag6', 'GGG');
    print $twig->sprint, "\n";
    
    sub add_xpath {
        my ($node, $path, $value) = @_;
    
        if ( $path =~ s|^/(\w+)/?|| ) {
            my $tag = $1;
            $node = $node->root;
            carp "Root element has wrong tag name" unless $node->tag eq $tag;
        }
        elsif ( $path =~ s|^(\w+)/?|| ) {
            my $tag = $1;
            if ( my $child = $node->has_child($tag) ) {
                $node = $child;
            }
            else {
                $node = $node->insert_new_elt('last_child', $tag);
            }
        }
        else {
            carp qq{Invalid path at "$path"} if $path =~ /\S/;
            $node->set_text($value);
            return 1;
        }
    
        add_xpath($node, $path, $value);
    }
    

    output

    <root>
      <tag1>AAA</tag1>
      <tag2>BBB</tag2>
      <tag3>
        <tag4>DDD</tag4>
      </tag3>
    </root>
    
    <root>
      <tag1>AAA</tag1>
      <tag2>BBB</tag2>
      <tag3>
        <tag4>CCC</tag4>
      </tag3>
    </root>
    
    <root>
      <tag1>AAA</tag1>
      <tag2>BBB<tag5>EEE</tag5></tag2>
      <tag3>
        <tag4>CCC</tag4>
      </tag3>
    </root>
    
    <root>
      <tag1>AAA</tag1>
      <tag2>BBB<tag5>EEE</tag5></tag2>
      <tag3>
        <tag4>CCC</tag4>
      </tag3>
      <tag6>GGG</tag6>
    </root>