Search code examples
phplaravellaravel-5phpcrawl

Using phpcrawl with Laravel 5.4


I am trying to use cuab's PHPCrawl within Laravel 5.4 and have included it through composer using this package: https://packagist.org/packages/mmerian/phpcrawl

I have tried running this sample code:

class MyCrawler extends PHPCrawler  
{ 
  function handleDocumentInfo($DocInfo)  
  { 
    if (PHP_SAPI == "cli"){
      $lb = "\n"; 
    } else {
      $lb = "<br />"; 
    }
    echo "Page requested: ".$DocInfo->url." (".$DocInfo->http_status_code.")".$lb; 
    echo "Referer-page: ".$DocInfo->referer_url.$lb; 

    if ($DocInfo->received == true) {
      echo "Content received: ".$DocInfo->bytes_received." bytes".$lb; 
    }
    else {
      echo "Content not received".$lb;  
    }

    echo $lb; 
    flush(); 
  }  
} 

$crawler = new MyCrawler(); 
$crawler->setURL("www.php.net"); 
$crawler->addContentTypeReceiveRule("#text/html#"); 
$crawler->addURLFilterRule("#\.(jpg|jpeg|gif|png)$# i"); 
$crawler->enableCookieHandling(true);  
$crawler->setTrafficLimit(1000 * 1024); 
$crawler->go(); 
$report = $crawler->getProcessReport(); 

if (PHP_SAPI == "cli") {
  $lb = "\n"; 
} else {
  $lb = "<br />"; 
}

echo "Summary:".$lb; 
echo "Links followed: ".$report->links_followed.$lb; 
echo "Documents received: ".$report->files_received.$lb; 
echo "Bytes received: ".$report->bytes_received." bytes".$lb; 
echo "Process runtime: ".$report->process_runtime." sec".$lb; 

But it throws multiple errors like this:

Class 'App\Http\Controllers\PHPCrawler' not found

How would you reference the correct namespace so that I can use the script within Laravel?


Solution

  • You should add the files via a class map in your composer.json file. You can see an example of what the class map looks like on their github page, which at the time of writing looks like this:

    "autoload": {
        "classmap": [
            "libs/Utils/PHPCrawlerUtils.class.php",
            "libs"
        ]
    }
    

    You'll have to prefix the files with vendor/mmerian/phpcrawl or whatever the path is to the libs folder.