Search code examples
phpxpathsimplexmldomxpath

get node and subnode text value in simpleXML


I need to find a way to extact the text content of a node + subnode using simpleXML.

Here is a dummy exemple:

<library>
    <book>
        <title>I love <i>apple pie</i></title>
    </book>
    <book>
        <title>I love <i>chocolate</i></title>
    </book>
</library>

Considering that that XML is stored in a $xml variable, here is my php:

$sxml = simplexml_load_string($xml);
foreach($sxml->xpath('//book') as $book){
    //echo $book->title returns "I love"
    //var_dump ($book->xpath('string(title)') returns empty array.
    // What I want : "I love apple pie" (on the first iteration of the foreach) "I love chocolate" (on the second iteration of the foreach)

    }

Am I missing something here ? Strangely enough, simpleXML seems to recognize the xpath ('string') function, but it doesn't seems to evaluate it. Is this a bug? is there any work around?
I have started to explore a solution with DOMDocument and DOMXpath::query, but it seems that the document goes back to the root element, which is not the behaviour I am wanting in the foreach...

Thanks a lot !


Solution

  • $book->title is of type SimpleXMLElement. You could pass that to dom_import_simplexml and get the nodeValue:

    $sxml = simplexml_load_string($xml);
    foreach($sxml->xpath('//book') as $book){
        echo dom_import_simplexml($book->title)->nodeValue . "<br>";
        // or use
        // echo strip_tags($book->title->asXml()) . "<br>";
    }
    

    That will give you:

    I love apple pie
    I love chocolate
    

    Demo