Search code examples
phpguzzle6domcrawler

Guzzle Async process response as it comes in


I've been working on a script that makes close to a thousand async requests using getAsync and Promise\Settle. Each page requested it then parsed using Symphony crawler filter method (Also slow but a separate issue.)

My code looks something like this:

$requestArray = [];
$request = new Client($url);

foreach ($thousandItemArray as $item) {
    $requestArray[] = $request->getAsync(null, $query);
}

$results = Promise\settle($request)->wait(true);
foreach ($results as $item) {
    $item->crawl();
}

Is there a way I can crawl the requested pages as they come in rather than waiting for them all and then crawling. Am i right in thinking this would speed things up if possible?

Thanks for your help in advance.


Solution

  • You can. getAsync() returns a promise, so you can assign an action to it using ->then().

    $promisesList[] = $request->getAsync(/* ... */)->then(
        function (Response $resp) {
            // Do whatever you want right after the response is available.
        }
    );
    
    $results = Promise\settle($request)->wait(true);
    

    P.S.

    Probably you want to limit the concurrency level to some number of requests (not to start all the requests at once). If yes, use each_limit() function instead of settle. And vote for my PR to be able to use settle_limit() ;)