I'm using Laravel 5 with Goutte for crawling, and I'm trying to extract links that I need, not all links. So, basic regex comes to scene. Regex is ok, tested online and it's working, but when I try to apply that regex in controller I'm getting an error. Here is what I tried:
<?php
namespace App\Http\Controllers;
use Illuminate\Http\Request;
use App\Http\Requests;
use App\Http\Controllers\Controller;
use Goutte\Client;
class RvnController extends Controller
{
public function index()
{
$client = new Client();
$crawler = $client->request('GET', 'http://www.jgpnis.rs/index.php/red-voznje-preuzimanje-pregled.html');
$regex_rvn_links = "/http:\/\/www.jgpnis.rs\/red_voznje\/([a-zA-Z0-9\-])+\/([a-zA-Z0-9\-\.])+/";
$links_array = array();
$crawler->filter('a')->each(function ($node) use($links_array) {
if (preg_match($regex_rvn_links , $node->link()->getUri())) {
$links_array[] = $node->link()->getUri();
}
});
dd($links_array);
}
}
And error is like : ErrorException in RvnController.php line 27: Undefined variable: regex_rvn_links
.
Ok, I tried to pass that error applying regex directly in preg_match
and it's working, but my $links_array
is empty. To not be confused, if I print $node->link()->getUri()
instead pushing it to array,I get about 15 links, so it's working. But then, I cant use it like this, I need that array. So, my question is how to use variables in this situations, because non of them is working. What I'm missing?
You are using closure, need to pass variables into the closure correctly.
1) $regex_rvn_links
must be passed into the closure. This is the cause of exception.
2) $links_array
must be passed by reference (&$links_array
). This is why you got empty array.
$crawler->filter('a')->each(function ($node) use (&$links_array, $regex_rvn_links) {
if (preg_match($regex_rvn_links , $node->link()->getUri())) {
$links_array[] = $node->link()->getUri();
}
});