How to get all elements inside body with PHP DomDocument

I'm trying to parse an Html string that may contain any valid html tags. I used this code to parse the string:

$doc = new DOMDocument();
$doc->loadHTML($product['description']); // comes from db
$els = $doc->getElementsByTagName('*');
foreach ($els as $node) {
    o($node->nodeName.' '.$node->nodeValue);
}

This does print my tags but the first two tags are html and body. I want to ignore those. The string from the db does not contain html or body tags. Here's an example:

<p>This is a paragraph</p>
<ol>
    <li>This is a list</li>
</ol>

I was wondering if there's a way to iterate over tags inside the body only. I tried these

$els = $doc->getElementsByTagName('body *');

$body = $doc->getElementsByTagName('body');
$els = $body->getElementsByTagName('*');

Both don't work. I have seen others use xpath but that gives me headaches. Can it be done with DomDocument?

Solution

When you use DOMDocument::loadHTML() in PHP, it automatically wraps the provided HTML fragment in <html> and <body> tags if they are not already present. This is because DOMDocument expects a complete HTML document structure.

The DOMDocument class doesn't support direct CSS-style selectors like body *, but you can work around this by accessing the body element first and then getting its child nodes:

$doc = new DOMDocument();
$doc->loadHTML($product['description']); // comes from db

// Get the body element
$body = $doc->getElementsByTagName('body')->item(0);

// Check if the body element exists
if ($body) {
    // Get all child elements of the body
    $els = $body->getElementsByTagName('*');

    foreach ($els as $node) {
        echo($node->nodeName . ' ' . $node->nodeValue . "\n");
    }
} else {
    echo "Body tag not found.";
}