Search code examples
phpdomdocument

Why does the node hierarchy in a PHP DOMDocument not match the given html hierarchy?


I have an html string that is formatted correctly for html.
After I load a PHP DOMDocument object with it I read the node tree and it is wrong.
The node tree does not match the html.

The table node is inside a #text node.
The 2nd td node is inside the first td node.
The 2nd tr node is inside the first tr node.
The 4th td node is inside the 3rd td node.
The #text 'after' is inside the table node.

Why is this wrong and how can I fix it?

The code below is executed here:
https://dev.aecperformance.com/test.php

//Formatted so you can easily see the format
$html = "<body>
            <div style='border:1px solid blue; padding:10px;' >
                This is a <b>bold <span style='color:red'>red test</span></b> a table 
                <table style='display:inline-block; border:1px solid green; padding:0'>
                    <tr><td>Head 1</td><td>Head 2</td></tr>
                    <tr><td>Value 1</td><td>Value 2</td></tr>
                </table>
                after 
            </div>
            After div
        </body>";
//Formatted with all tabs and line feeds stripped
$html = "<body><div style='border:1px solid blue; padding:10px;' >This is a <b>bold <span style='color:red'>red test</span></b> a table<table style='display:inline-block; border:1px solid green; padding:0'><tr><td>Head 1</td><td>Head 2</td></tr><tr><td>Value 1</td><td>Value 2</td></tr></table>after</div>After div</body>";

$nNxtLvl = 0;
function processChildNodes($node)
{       
    global $nNxtLvl;
    
    $lvl = $nNxtLvl;    
    for($i=0; $i < $lvl; $i++) {
        echo "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;";
    }
    echo $node->nodeName;
    if($node->nodeName == "#text") echo " " . $node->nodeValue;
    echo "<br>";
    $cNodes = $node->childNodes;
    if (!empty($cNodes)) {
        foreach ($cNodes as $cNode) {               
            $nNxtLvl++;
            processChildNodes($cNode);              
        }
    }
    $nNxtLvl = $lvl;        
}

$dom = new \DOMDocument();
$dom->loadHTML($html);
$ls = $dom->getElementsByTagName('body');
$elBody = $ls[0];
$ls = $elBody->childNodes;
for($i=0; $i < count($ls); $i++) {
    processChildNodes($ls->item($i));      
}

Solution

  • You put $nNxtLvl++; inside the child node loop, so each time there is a new iteration (aka another sibling node), the level is increased.

    You can fix it just by moving it outside the loop:

        if (!empty($cNodes)) {
            $nNxtLvl++; // <== Moved outside the loop
            foreach ($cNodes as $cNode) {
                processChildNodes($cNode);
            }
        }
    

    Some other advices about your code:

    • Avoid using global variables (they are quite never used in our days in Php); instead pass the level as second argument of you function processChildNodes();
    • You can clearly avoid the for($i...) loop, indeed you just want to echo a repeated string, so you can use str_repeat(), for example: echo str_repeat("&nbsp;", 8 * $lvl); instead of the whole loop.