Search code examples
xmlperlxml-libxml

Remove the tag of an XML node with certain attributes using perl XML::LibXML


I am using the perl XML::LibXML module to manipulate an XML file.

I want to remove the opening and closing tags of an XML node if it has a certain attribute, making its text and subnodes as a part of the parent of the node.

Here's an unsuccessful attempt. If fails with a insertBefore/insertAfter: HIERARCHY_REQUEST_ERR:

#!/usr/bin/env perl
use 5.020;
use warnings;
use XML::LibXML;

#the input xml

my $inputstr = <<XML;
<root>
<a>
<b class="deletethistag">keep this text<c>keep this c node</c>keep this text too</b>
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML

my $desiredstr = <<XML ;
<root>
<a>keep this text<c>keep this c node</c>keep this text too
<b class="someothertag">don't change this</b>
<b>don't change this node without an attribute</b>
<c class="type1">don't change this either</c>
</a>
</root>
XML

my $dom = XML::LibXML->load_xml(
string => $inputstr
);

# Convert $inputstr to $desiredstr *** doesn't work ***
foreach my $node ($dom->findnodes(q#//a/b[@class="deletethistag"]/*#)) {
    my $nodestring = $node->toString(1);
    say STDERR $nodestring;
    my $replacementnode = XML::LibXML->load_xml(string => $nodestring);
    $node->parentNode()->insertAfter($replacementnode, $node);
    $node->unbindNode();
    }
say $dom->toString(1);

I want to use the code to remove <span lang="en" xml:space="preserve">...</span> markup from a file, but I have framed it as a more general question so that I understand more of the details of working with XML::LibXML.


Solution

  • $node->childNodes() returns all the text nodes and other sub-nodes of $node.

    Insert all the children of $node into $node's parent at the same place as $node. Then delete the original $node with $node->unbindNode()

    Here's a working script:

    #!/usr/bin/env perl
    use 5.020;
    use warnings;
    use XML::LibXML;
    
    #the input xml
    my $inputstr = <<XML;
    <root>
    <a>
    <b class="deletethistag">keep this text<c>keep this c node</c>keep this text too</b>
    <b class="someothertag">don't change this</b>
    <b>don't change this node without an attribute</b>
    <c class="type1">don't change this either</c>
    </a>
    </root>
    XML
    
    my $desiredstr = <<XML ;
    <root>
    <a>
    keep this text<c>keep this c node</c>keep this text too
    <b class="someothertag">don't change this</b>
    <b>don't change this node without an attribute</b>
    <c class="type1">don't change this either</c>
    </a>
    </root>
    XML
    
    my $dom = XML::LibXML->load_xml(
    string => $inputstr
    );
    
    for my $node ($dom->findnodes(q#//a/b[@class="deletethistag"]#)) {
        my $parent = $node->parentNode();
        for my $child_node ( $node->childNodes() ) {
            $parent->insertBefore($child_node, $node);
            }
        $node->unbindNode();
        }
    say $dom->toString();
    

    H/T: https://stackoverflow.com/a/31680169/22989509