Search code examples
phpdomglobnodevalue

How to get the html inside a $node rather than just the $nodeValue


Description of the current situation:

I have a folder full of pages (pages-folder), each page inside that folder has (among other things) a div with id="short-info".
I have a code that pulls all the <div id="short-info">...</div> from that folder and displays the text inside it by using textContent (which is for this purpose the same as nodeValue)

The code that loads the divs:

<?php
$filename = glob("pages-folder/*.php");
sort($filename);
foreach ($filename as $filenamein) {
    $doc = new DOMDocument();
    $doc->loadHTMLFile($filenamein);
    $xpath = new DOMXpath($doc);
    $elements = $xpath->query("*//div[@id='short-info']");

        foreach ($elements as $element) {
            $nodes = $element->childNodes;
            foreach ($nodes as $node) {
                echo $node->textContent;
            }
        }
}
?>

Now the problem is that if the page I am loading has a child, like an image: <div id="short-info"> <img src="picture.jpg"> Hello world </div>, the output will only be Hello world rather than the image and then Hello world.

Question:

How do I make the code display the full html inside the div id="short-info" including for instance that image rather than just the text?


Solution

  • You have to make an undocumented call on the node.

    $node->c14n() Will give you the HTML contained in $node.

    Crazy right? I lost some hair over that one.

    http://php.net/manual/en/class.domnode.php#88441

    Update

    This will modify the html to conform to strict HTML. It is better to use

    $html = $Node->ownerDocument->saveHTML( $Node );

    Instead.