I'm trying to use the Dom\HtmlDocument
that's new in PHP 8.4.
Let's say I just need to count all divs:
<?php
$html = <<<HTML
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Example</title>
</head>
<body>
<div>Hello</div>
</body>
</html>
HTML;
$doc = Dom\HTMLDocument::createFromString($html);
$xpath = new Dom\XPath($doc);
// No divs found:
$divs = $xpath->query('//div');
echo $divs->count(); // 0
// 6 elements found, including the div:
$anyTags = $xpath->query('//*');
echo $anyTags->count(); // 6
As you can see, when I use *
to grab any element, it works as expected and even the div is found.
Why can't I use tag selectors? I tried some fancier selectors with classnames etc. and it works properly, as long as I use *
instead of specific tags.
By default, Dom\HTMLDocument::createFromString
creates all nodes in a namespace http://www.w3.org/1999/xhtml
. So if you want to query that via XPath, you need to make a name-space based query, like
$xpath->registerNamespace('xhtml', 'http://www.w3.org/1999/xhtml');
$divs = $xpath->query('//xhtml:div');
If you want un-namespaced HTML (as in 99% of use cases), pass Dom\HTML_NO_DEFAULT_NS
to
Dom\HTMLDocument::createFromString
:
$doc = Dom\HTMLDocument::createFromString(
$html, Dom\HTML_NO_DEFAULT_NS);
$divs = $xpath->query('//div'); // returns 1 div