I am trying to use a package called Goutte (php scraper/web-crawler) like this:
<?php
// Init
require_once 'vendor/autoload.php';
use Goutte\Client;
$client = new Client();
$reviews = array();
// Parse Review Site
$crawler = $client->request('GET', 'http://review-site-url-here');
$crawler->filter('div.review')->each(function($node) use ($reviews)
{
// Parse Data
$player_name = $node->filter('tr.switch > td > a')->first()->text();
// other fields
// Build Reviews
array_push($reviews, [
'player_name' => $player_name,
// other fields
]);
});
// Debug
echo "<pre>";
print_r($reviews);
When this script runs, the $reviews
array is always empty. But, if I print_r
inside the anonymous function, it appears to only show the current element in each loop. For example, if there was 4 reviews, i did this:
// Parse Review Site
$crawler = $client->request('GET', 'http://review-site-url-here');
$crawler->filter('div.review-BL-mid')->each(function($node) use ($reviews)
{
// Parse Data
$player_name = $node->filter('tr.switch > td > a')->first()->text();
// other fields
// Build Reviews
array_push($reviews, [
'player_name' => $player_name,
// other fields
]);
// Debug
print_r($reviews);
});
It outputs like this:
Array
(
[0] => Array
(
[player_name] => aaaa
)
)
Array
(
[0] => Array
(
[player_name] => bbb
)
)
Array
(
[0] => Array
(
[player_name] => ccc
)
)
Array
(
[0] => Array
(
[player_name] => ddd
)
)
As if the array is never updated within the anonymous function. Any idea how to fix this?
Okay... I just realised the issue soon after posting this and trying out a few things. It appears I need to pass the array variable $reviews
by reference for this to work; i.e.
$crawler->filter('div.review')->each(function($node) use (&$reviews) {
// ...
});
Hope this helps someone else.