Search code examples
phplaravelweb-scrapingweb-crawlerhtml-parsing

Get size of scraped image with domcrawler (Goutte)


For my website, users can submit links.

What I want is that when a link is submitted, it parses the DOM of the page being linked, finds the largest image (largest being total width + total height), and make saves a thumbnail of the largest image on that page.

This is so that a thumbnail can be alongside their link.

To achieve this, I'm using the Goutte package and the Image Intervention package with Laravel.

This is what I've done so far:

$goutteClient = new Client();
$guzzleClient = new GuzzleClient(array(
    'timeout' => 15,
));
$goutteClient->setClient($guzzleClient);

$crawler = $goutteClient->request('GET', 'https://www.reddit.com');

$result = $crawler
->filterXpath('//img')
->extract(array('src'));

foreach ($result as $image) {
    //get the width and height of each $image
}       

//$file = image with the biggest width + height


$thumbnail = Image::make($file);
$large->resize(900, 900, function ($constraint) {
    $constraint->aspectRatio();
    $constraint->upsize();
});     

The commented out parts is what I'm struggling with.

The foreach will return the src of the image, but I don't know how to view the properties of the image.

What is the best way to do this? Saving all the images on the page and THEN viewing their width/height is not an option for me.


Solution

  • I believe you can use,

    getimagesize()

    https://www.php.net/manual/en/function.getimagesize.php

    it will return an array of the attributes which you are looking for. Including the height and width. It requires that allow_url_fopen is set to true in your server configuration. Assuming the image is remote.

    so in your case. it may look something like...

        $files = [];
    
    // maybe pass this by reference as &$image and store the totals in the same array, otherwise
    foreach ($images as $image) {
        $attributes = getimagesize($image);
    
        $height = $attributes[0];
        $width = $attributes[1];
    
        $total = $height + $width;
    
        // bind the total as the id of the array, if there are multiple matching totals, it will always be the last
        // image that matches that is selected.
        $files[$total] = $image;
    }
    
    // then you can use any standard logic to extract the data from the new array.