Search code examples
phparraysgouttedomcrawler

Updating an array within an an anonymous function not working


I am trying to use a package called Goutte (php scraper/web-crawler) like this:

<?php

// Init
require_once 'vendor/autoload.php';
use Goutte\Client;
$client = new Client();
$reviews = array();

// Parse Review Site
$crawler = $client->request('GET', 'http://review-site-url-here');
$crawler->filter('div.review')->each(function($node) use ($reviews)
{
    // Parse Data
    $player_name = $node->filter('tr.switch > td > a')->first()->text();
    // other fields

    // Build Reviews
    array_push($reviews, [
        'player_name' => $player_name,
        // other fields
    ]);
});

// Debug
echo "<pre>";
print_r($reviews);

When this script runs, the $reviews array is always empty. But, if I print_r inside the anonymous function, it appears to only show the current element in each loop. For example, if there was 4 reviews, i did this:

// Parse Review Site
$crawler = $client->request('GET', 'http://review-site-url-here');
$crawler->filter('div.review-BL-mid')->each(function($node) use ($reviews)
{
    // Parse Data
    $player_name = $node->filter('tr.switch > td > a')->first()->text();
    // other fields

    // Build Reviews
    array_push($reviews, [
        'player_name' => $player_name,
        // other fields
    ]);

    // Debug
    print_r($reviews);
});

It outputs like this:

Array
(
    [0] => Array
        (
            [player_name] => aaaa
        )

)
Array
(
    [0] => Array
        (
            [player_name] => bbb
        )

)
Array
(
    [0] => Array
        (
            [player_name] => ccc
        )

)
Array
(
    [0] => Array
        (
            [player_name] => ddd
        )

)

As if the array is never updated within the anonymous function. Any idea how to fix this?


Solution

  • Okay... I just realised the issue soon after posting this and trying out a few things. It appears I need to pass the array variable $reviews by reference for this to work; i.e.

    $crawler->filter('div.review')->each(function($node) use (&$reviews) {
       // ...
    });
    

    Hope this helps someone else.