I have html with such structure:
<div id="info">
<h1>Some text</h1>
</div>
<div class="box_con">
<div id="list">
<dl>
<dt>《Library of Heaven“s Path》 Volume 1</dt>
<dd> <a style="" href="1300359.html">1 Swindler</a></dd>
</dl>
</div>
</div>
This is my code:
$doc = new DOMDocument;
@$doc->loadHTMLFile($url);
$volume_titles = [];
$title = $doc->getElementById('info')->childNodes->item(1)->nodeValue;
$volume_list = $doc->getElementById('list')->getElementsByTagName('dl');
And I dont know how to iterate this dl
element to retrieve <a>
href and content. I have already tried a lot of loops.
$el = $volume_list->firstChild;
do {
var_dump($el);
} while ($el = $el->nextSibling);
$length = $volume_list->length;
for ($i = 0; $i < $length; $i++) {
$node = $volume_list->childNodes->item($i);
var_dump($node);
die();
}
foreach ($volume_list->childNodes as $volume) {
//var_dump($volume_list->getElementsByTagName('dd')->item(0)->nodeValue);
var_dump($volume);
die();
}
But nothing works
I think the main problem is that when you use getElementsByTagName()
, this returns a list of nodes (actually a DOMNodeList
). So when you want to access (for example) the first item for that tag, you will need to reference the first item in an array.
If you extended your initial code to get the nested tag elements, you could end up with the following code, which always uses [0]
on the result of getElementsByTagName()
to pick out the first item.
$title = $doc->getElementById('info')->childNodes->item(1)->nodeValue;
$volume_list = $doc->getElementById('list')->getElementsByTagName('dl');
$a = $volume_list[0]->getElementsByTagName('dd')[0]->getElementsByTagName('a');
echo $a[0]->getAttribute('href');