I honestly tried to find a solution for php, but a lot of threads sound similar, but are not applicable for me or are for completely different languages.
I want to split an xml file based on nodes. Ideally multiple nodes, but of course one is enough and could be applied multiple times.
e.g. I want to split this by the tag <thingy>
and <othernode>
:
<root>
<stuff />
<thingy><othernode>one</othernode></thingy>
<thingy><othernode>two</othernode></thingy>
<thingy>
<othernode>three</othernode>
<othernode>four</othernode>
</thingy>
<some other data/>
</root>
Ideally I want to have 4 xmlstrings of type:
<root>
<stuff />
<thingy><othernode>CONTENT</othernode></thingy>
<some other data/>
</root>
With CONTENT being one, two, three and four. Plottwist: CONTENT can also be a whole subtree. Of course it all also can be filled with various namespaces and tag prefixes (like <q1:node/>
. Formatting is irrelevant for me.
So far my best guess is something with DomDocument, node cloning and removing everything but one node?
Interesting question.
If I get it right, it is given that <othernode>
is always a child of <thingy>
and the split is for each <othernode>
at the place of the first <thingy>
in the original document.
DOMDocument appeared useful in this case, as it allows to easily move nodes around - including all its children.
Given the split on a node-list (from getElementsByTagName()
):
echo "---\n";
foreach ($split($doc->getElementsByTagName('othernode')) as $doc) {
echo $doc->saveXML(), "---\n";
}
When moving all <othernode>
elements into a DOMDocumentFragement of its own while cleaning up <thingy>
parent elements when emptied (unless the first anchor element) and then temporarily bring each of them back into the DOMDocument:
$split = static function (DOMNodeList $nodes): Generator {
while (($element = $nodes->item(0)) && $element instanceof DOMElement) {
$doc ??= $element->ownerDocument;
$basin ??= $doc->createDocumentFragment();
$anchor ??= $element->parentNode;
[$parent] = [$element->parentNode, $basin->appendChild($element)];
$parent->childElementCount || $parent === $anchor || $parent->parentNode->removeChild($parent);
}
if (empty($anchor)) {
return;
}
assert(isset($basin, $doc));
while ($element = $basin->childNodes->item(0)) {
$element = $anchor->appendChild($element);
yield $doc;
$anchor->removeChild($element);
}
};
This results in the following split:
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>one</othernode></thingy>
<some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>two</othernode></thingy>
<some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>three</othernode></thingy>
<some other="data"/>
</root>
---
<?xml version="1.0"?>
<root>
<stuff/>
<thingy><othernode>four</othernode></thingy>
<some other="data"/>
</root>
---