I'm currently using a perl script with LibXML to process a given XML file. This goes decently well, but if I have a node with both child nodes and free text, I begin to struggle. An example input would be:
<Errors>
<Error>
this node works fine
</Error>
<Error>
some text <testTag>with a node</testTag> in between
</Error>
</Errors>
Expected output:
<Errors>
<Error>
this node works fine
</Error>
<Error>
some text HELLOwith a nodeHELLO in between
</Error>
</Errors>
I tried replaceChild("HELLO", $testTagNode); to replace the nodes with a string, which I could then (if needed) process further with a simple search-replace, but I only run into the "not a blessed reference" error. (I feel like that would have been pretty dirty if it actually worked that way.)
If I try to run a simple search-replace directly on the parent node like this
$error=~s/\</HELLO/g;
it will simply never trigger (no matter if I escape the < or not), because LibXML seems to ignore every tag that I don't specifically ask for; if I try to print out the second Error it will also give me just
some text with a node in between
which is actually a very nice functionality for the rest of the file, but not in this instance.
I can however do
$error->removeChild($testTagNode);
which shows me that it actually does get found, but doesn't help me further. I could theoretically remove the node, save the content, and then just insert the content back into the parent; the problem being that it needs to be at the exact location where it was before. The only thing that I could probably do is read in the entire file as a string, let the basic search-replace run over it BEFORE feeding it into LibXML, but that could create a pretty big overhead and isn't really a nice solution.
I feel like I'm overlooking something substantial, as this looks like a pretty basic tasks to do, but I can't seem to find anything. Maybe I'm just looking in the wrong direction, and there is a completely different approach available. Any help is appreciated.
Removing the testTag
element would remove all of its children too, so we must move the children of each testTag
element into the parent of the testTag
element before deleting the testTag
element. In XML::LibXML, this is done as follows: (Tested)
for my $node ($doc->findnodes('/Errors/Error//testTag')) {
my $parent = $node->parentNode();
for my $child_node (
XML::LibXML::Text->new("HELLO"),
$node->childNodes(),
XML::LibXML::Text->new("HELLO"),
) {
$parent->insertBefore($child_node, $node);
}
$node->unbindNode();
}
Notes:
testTag
elements with any number of text and element children.testTag
elements that aren't direct children of Error
elements. Even handles nested testTag
elements. (Use /Errors/Error/testTag
instead of /Errors/Error//testTag
if you only want to handle direct children of Error
elements.)