Search code examples
xmlperlxml-twig

Understanding XML::Twig"s wrap_in


I am looping through a twig's descendants, and in this loop I want to create new twigs to output later. Those new twigs are basically wrapped versions of the current looped item. Something like this:

# $twig already exists.
my @descendants = $twig->root->first_child->descendants_or_self;
foreach (@descendants) {
  $_->root->wrap_in('tree');

  my $treetop = XML::Twig->new()->set_root($_);

  $treetop->root->wrap_in('trees', treebank => {
    id => 'someid'
  });

  if (exists $hash{'somekey'}) {
    $treetop->root->set_att(c => 'd');
  }
}

An example of $_->sprint in the loop:

<node begin="0">
  <node a="b"></node>
</node>

However, the result of this (after the last if-clause) is ($treetop->sprint):

<node begin="0" c="d">
  <node a="b"></node>
</node>

In other words, the attribute is added to the initial 'root', and no wrapping happens. But what I'm trying to achieve is:

<treebank id="someid" c="d">
  <trees>
    <tree>
      <node begin="0">
        <node a="b"></node>
      </node>
    </tree>
  </trees>
</treebank>

Interestingly, when I call $_->root I get to see the original root ($twig's root), so I guess the root is implicitly inherited as part of the object. I think that that's where most of my confusion lies: root of the special $_ is actually the root of $twig and not the root of the sub tree itself.

What is the right way to take an input twig descendant, turn it into a twig with a wrapping structure?


Solution

  • Normally when trying to create subdocuments like that, I just create a new one, and insert a copied node.

    Something like this:

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    
    use XML::Twig;
    
    my $twig = XML::Twig->new->parse( \*DATA );
    
    foreach my $node ( $twig->get_xpath('./node') ) {
    
       my $new_root =
         XML::Twig::Elt->new( 'treebank', { id => "someid", c => "d" } );
       my $new_doc = XML::Twig->new->set_root($new_root);
       $new_doc->set_xml_version('1.0');
       my $tree = $new_doc->root->insert_new_elt('trees')->insert_new_elt('tree');
    
       $node->cut;
       $node->paste( 'last_child', $tree );
    
       $new_doc->set_pretty_print('indented');
       $new_doc->print;
    }
    
    __DATA__
    <xml>
     <node begin="0" c="d">
       <node a="b"></node>
    </node>
    </xml>
    

    But to address your specific points - yes, root does give the document root. It's a special case XML element, and root points you at the top level, because it's part of the context of the node.

    wrap_in is a special case for modifying a node but it won't work with a root node, because they're a special case. So you could (using my example above):

    foreach my $node ( $twig->get_xpath('./node') ) {
       my $new_doc = XML::Twig->new;
       $new_doc->set_xml_version('1.0');
    
       $node->cut;
       $new_doc->set_root ($node);
       $node->wrap_in( 'trees', treebank => { id => 'someid' } );
       $new_doc->set_pretty_print('indented');
       $new_doc->print;
    }
    

    You can separate this out using the cut and paste methods of XML::Twig,