I am parsing XML in PHP with SimpleXML and have an XML like this:
<xml>
<element>
textpart1
<subelement>subcontent1</subelement>
textpart2
<subelement>subcontent2</subelement>
textpart3
</element>
</xml>
When I do $xml->element
it naturally gives me the whole element, as in all three textparts.
So if I parse this into an array (with a foreach
for the children) I get:
0 => textpart1textpart2textpart3, 1 => subcontent1, 2 => subcontent2
I need a way to parse the <element>
node so that each textpart that stops at, or begins after a subelement is treated as its own element.
As a result I am looking for an ordered list that could be express in an array like this:
0 => textpart1, 1 => subcontent1, 2 => textpart2, 3 => subcontent2, 4 => textpart3
Is that possible without altering the XML file? Thanks in advance for any hints!
As others have said, SimpleXML doesn't have any support for accessing individual text nodes as separate entities, so you will need to supplement it with some DOM methods. Thankfully, you can switch between the two at will using dom_import_simplexml
and simplexml_import_dom
.
The key pieces of DOM functionality you need are:
Given those, you can write a function which returns an array with a mixture of SimpleXML objects for child elements, and strings for child text nodes, something like this:
function get_child_elements_and_text_nodes($sx_element)
{
$return = array();
$dom_element = dom_import_simplexml($sx_element);
foreach ( $dom_element->childNodes as $dom_child )
{
switch ( $dom_child->nodeType )
{
case XML_TEXT_NODE:
$return[] = $dom_child->nodeValue;
break;
case XML_ELEMENT_NODE:
$return[] = simplexml_import_dom($dom_child);
break;
}
}
return $return;
}
In your case, you need to recurse down the tree, which makes it a little confusing if you mix DOM and SimpleXML as you go, so you could instead write the recursion entirely in DOM and convert the SimpleXML object before running it:
function recursively_find_text_nodes($dom_element)
{
$return = array();
foreach ( $dom_element->childNodes as $dom_child )
{
switch ( $dom_child->nodeType )
{
case XML_TEXT_NODE:
$return[] = $dom_child->nodeValue;
break;
case XML_ELEMENT_NODE:
$return = array_merge($return, recursively_find_text_nodes($dom_child));
break;
}
}
return $return;
}
$text_nodes = recursively_find_text_nodes(dom_import_simplexml($simplexml->element));