Search code examples
phpsymfonyweb-scrapingcss-selectorsdomcrawler

Can't select link


I'm attempting to scrape the href of each .row. Ultimately, I'd like to click the link and access the DOM it links too, but I can't get either a Link object or the href attribute..

Not sure if the fact that the a attributes don't have any text in them is an issue, but that's the DOM I have to work with.

Help?

<?php require 'vendor/autoload.php';

use Symfony\Component\DomCrawler\Crawler;

$html = <<<'HTML'
<!doctype html>
<html>
  <body>
    <div class="content">
      <p class="row"><a href="/uri1"></a></p> 
      <p class="row"><a href="/uri2"></a></p> 
      <p class="row"><a href="/uri3"></a></p> 
    </div>
  </body>
<html>
HTML;

$dom = new Crawler($html);

$content = $dom->filter('.row');
$rows = [];

foreach ($content as $element)
{
    $node = new Crawler($element);
    $link = $node->filter('a');
    echo $link->html(); // Empty?

    try 
    {
        $link = $node->selectLink('')->link();
        echo $link->getUri();
    } 
    catch (Exception $ex) 
    {
        // Throws: Current URI must be an absolute URL ("").Current URI must be 
        // an absolute URL ("").Current URI must be an absolute URL ("").
        echo $ex->getMessage();
    }

}

Solution

  • I use xpath to filder DOM elements with DomCrawler, because I like that I have more control on what I'm filtering. The below code should echo the urls in your html.

    $crawler = new Crawler($html);
    
    $crawler->filterXPath("//p[@class='row']")->each(function (Crawler $node, $i) {
    
    $url = $node->filterXPath("//a/@href")->text();
    echo $url;
    
    }