Search code examples
phpgoutte

Scraping AliExpress Images using Goutte


I'm struggling to scrape a full page of Aliexpress image. It gets all of the alt tags, and the first 8 images.

<?php

require 'vendor/autoload.php';

use Goutte\Client;

$url = "https://www.aliexpress.com/af/tie.html?SearchText=tie";

$client = new Client();

$crawler = $client->request('GET', $url);

$output = $crawler->filter('#hs-below-list-items li div div.img.img-border div a img')->each(function ($node) {

echo '<img src="' . $node->attr('src') . '" alt="' . $node->attr('alt') . '">';

});

var_dump($output);

Is this something todo with AliExpress Lazy Loading in the images possible?

Would I need to use something like a headless browser? If so can you please point me in the right direction.

Any help would be greatly appreciated.

Thanks, Jake.


Solution

  • You need to filter for the data attribute itself.

    $output = $crawler->filter('img.picCore[image-src]')->each(function ($node) {
    
        echo '<img src="' . $node->attr('image-src') . '" alt="' . $node->attr('alt') . '">';
    
    });
    

    JH