Supposed I have HTML like this:
<div id="container">
<li class="list">
Test text
</li>
</div>
And I want to get the contents of the li
.
I can get the contents of the container div using this code:
$html = '
<div id="container">
<li class="list">
Test text
</li>
</div>';
$dom = new \DomDocument;
$dom->loadHTML($html);
$xpath = new \DomXPath($dom);
echo $dom->saveHTML($xpath->query("//div[@id='container']")->item(0));
I was hoping I could get the contents of the subelement by simply adding it to the query (like how you can do it in simpleHtmlDom):
echo $dom->saveHTML($xpath->query("//div[@id='container'] li[@class='list']")->item(0));
But a warning (followed by a fatal error) was thrown, saying:
Warning: DOMXPath::query(): Invalid expression ...
The only way I know of to do what I'm wanting is this:
$html = '
<div id="container">
<li class="list">
Test text
</li>
</div>';
$dom = new \DomDocument;
$dom->loadHTML($html);
$xpath = new \DomXPath($dom);
$dom2 = new \DomDocument;
$dom2->loadHTML(trim($dom->saveHTML($xpath->query("//div[@id='container']")->item(0))));
$xpath2 = new \DomXPath($dom2);
echo $xpath2->query("//li[@class='list']")->item(0)->nodeValue;
However, that's an awful lot of code just to get the contents of the li
, and the problem is that as items are nested deeper (like if I want to get `div#container ul.container li.list) I have to continue adding more and more code.
With simpleHtmlDom, all I would have had to do is:
$html->find('div#container li.list', 0);
Am I missing an easier way to do things with DomDocument and DomXPath, or is it really this hard?
You were close in your initial attempt; your syntax was just off by a character. Try the following XPath:
//div[@id='container']/li[@class='list']
You can see you had a space between the div
node and the li
node where there there should be a forward slash.