How to extract an image URL from a div?

I want to extract the background image url from a div with PHP. I want to search the class in the string and extract the background-image url.

For example:

<div class="single-post-image" style="background-image: url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>

This is the output that i want:

https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg

Solution

You could use straight regex, but personally, I would scrape out just the elements your after with dom document/xpath, then regex out the value from the style.

<?php
$html = '
<html><head></head><body>
<div class="single-post-image" style="background-image:url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
<div class="single-post-image" style="background-image: url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg )"></div>
<div class="single-post-image" style="background-image: url( https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
<div class="single-post-image" style="background-image: url(\'https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg\')"></div>
<div class="single-post-image"></div>
</body>
</html>';

$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

$xpath = new DomXPath($dom);

$images = [];
foreach ($xpath->query("//*[contains(@class, 'single-post-image')]") as $img) {
    if ($img->hasAttribute('style')) {
        preg_match('/url\((.*)\)/', $img->getAttribute('style'), $match);
        if (isset($match[1])) $images[] = trim($match[1], '\'" ');
    }
}

print_r($images);

Result:

Array
(
    [0] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
    [1] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
    [2] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
    [3] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
)

Example: https://3v4l.org/icTdS

Is little more code but id like to believe it's more robust and efficient then regexing over a massive HTML document.