Search code examples
phpcssurlstring-parsing

How to extract an image URL from a div?


I want to extract the background image url from a div with PHP. I want to search the class in the string and extract the background-image url.

For example:

<div class="single-post-image" style="background-image: url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>

This is the output that i want:

https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg

Solution

  • You could use straight regex, but personally, I would scrape out just the elements your after with dom document/xpath, then regex out the value from the style.

    <?php
    $html = '
    <html><head></head><body>
    <div class="single-post-image" style="background-image:url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
    <div class="single-post-image" style="background-image: url(https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg )"></div>
    <div class="single-post-image" style="background-image: url( https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg)"></div>
    <div class="single-post-image" style="background-image: url(\'https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg\')"></div>
    <div class="single-post-image"></div>
    </body>
    </html>';
    
    $dom = new DOMDocument();
    libxml_use_internal_errors(true);
    $dom->loadHTML($html);
    libxml_clear_errors();
    
    $xpath = new DomXPath($dom);
    
    $images = [];
    foreach ($xpath->query("//*[contains(@class, 'single-post-image')]") as $img) {
        if ($img->hasAttribute('style')) {
            preg_match('/url\((.*)\)/', $img->getAttribute('style'), $match);
            if (isset($match[1])) $images[] = trim($match[1], '\'" ');
        }
    }
    
    print_r($images);
    

    Result:

    Array
    (
        [0] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
        [1] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
        [2] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
        [3] => https://www.mmowg.net/wp-content/uploads/2020/11/a8Tnv1kVyXY.jpg
    )
    

    Example: https://3v4l.org/icTdS

    Is little more code but id like to believe it's more robust and efficient then regexing over a massive HTML document.