Search code examples
phpgouttedomcrawler

How to extract data with Goutte Crawler?


This code, returned hrefs to content, now I want to extract content form this hrefs and sent it to my view. Name divs which I need to extract:

<div class="c_pad">
  <div class="c_label">
    <span class="std_header2">Contact:</span>
  </div>
<div class="c_name">
  <span class="std_text_b">Monkey</span>
</div>
<div class="clear"></div>
</div>

<div class="c_pad">
    <div class="c_label">
      <span class="std_header2">Phone number:</span>
    </div>
    <div class="c_phone">
      <span class="std_text_b">001111111</span>
    </div>
    <div class="clear"></div>
</div>

for($i=0; $i <= 1; $i++)
    {
      $p = new Client();
      $d = $p->request('GET', ''.$link.'&std=1&results='. $i);
      $n = $d->filter('a[class="o_title"]')->each(function ($node) 
        { 
         $pp = new Client();
         $dd = $pp->request('GET', $node->attr('href'));
         $kk = $dd->filter('div[id="adv_desc"]')->each(function ($tekst) {  echo $node->attr('href').'<br>'.$tekst->text(); 
                    });
         });
    }

Solution

  • You want to filter specific tags with attributes.

    But you are using $d->filter('a[class="o_title"]'). This filters the tag a with the attribute class="o_title". And that's not part of your content.

    You simply need to adjust your node filter to select the correct elements.

    Use the jQuery Selectors Syntax: https://api.jquery.com/category/selectors/

    Referencing the documentation of Symfony's DomCrawler, which is used by Goutte: http://symfony.com/doc/current/components/dom_crawler.html#node-filtering