Search code examples
phpregexdompreg-matchdomdocument

PHP - DOMDocument scrape divs dont remove images


that is my current php code:

$dom = new DOMDocument;
@$dom->loadHTML($file);

$xpath = new DOMXPath($dom);
$divs = $xpath->query('//div[@class="test"]');
if ($divs->length > 0) {
    foreach ($divs as $key => $div) {
       print_r($div);
    }
}

In every div is also an image which I also want to output but DOMDocument is removing it.

The images are implemented like in the html file:

<img src="loading.gif" data-src="https://test.com/images/images/120/1313131313232.jpg" alt="test" />

I want to ouput the value of data-src additionally to the text in the div.

Thank you, With best Regards


Solution

  • For every div you could use $div->getElementsByTagName("img") to get the image. Then loop the images check if the alt attribute of the img is test and get the data-src attribute:

    @$dom->loadHTML($file);
    $xpath = new DOMXPath($dom);
    $divs = $xpath->query('//div[@class="test"]');
    foreach ($divs as $key => $div) {
        echo $div->textContent . "<br>";
        foreach ($div->getElementsByTagName("img") as $img) {
            if ($img->getAttribute('alt') === 'test') {
                echo $img->getAttribute('data-src') . "<br>";
            }
        }
    }
    

    Demo