Search code examples
phplaravelweb-scrapingdomcrawler

DomCrawler get element contents after specific element


I'm trying to get an element's contents that comes right after another element. Here's some example code:

<header>2010</header>
<div>
    <a href="">Some data</a>
    <a href="">Some data</a>
</div>
<header>2011</header>
<div>
    <a href="">Some data</a>
    <a href="">Some data</a>
</div>

I need to get the data sorted by years and I've tried something, but for 2010 it takes the data for all years.

$crawler->filter('header')->each(function(Crawler $c) {
$year = $c->text();
$next = $c->nextAll();
$next->filter('div a')->each(function($node){
    $node->text();
});
});

How do I make it stop after getting all div as between the two headers?


Solution

  • In your case you can take only first node from nextAll and as this first item is div, filter just a in it:

    $crawler->filter('header')->each(function(Crawler $c) {
        $year = $c->text();
        dump($year);
        $next = $c->nextAll()->first();
        $next->filter('a')->each(function($node){
            dump($node->text());
        });
    });