Search code examples
phpdomcurlweb-crawlerdomcrawler

How to make crawling and extracting data in each pager links?


I want to extract all the attributes name="" of a website,

example html

<div class="link_row">
    <a href="" class="listing_container" name="7777">link</a>
</div>

I have the following code:

<?php
$html = new DOMDocument();
@$html->loadHtmlFile('http://www.onedomain.com/plus?ca=11_c&o=1');
$xpath = new DOMXPath( $html );
$nodelist = $xpath->query( "//div[@class='link_row']/a[@class='listing_container']/@name" );
foreach ($nodelist as $n){
    echo $n->nodeValue."\n<br>";
}
?>

Result is:

7777

This code is working fine, but need not be limited to one pager number.

http://www.onedomain.com/plus?ca=11_c&o=1 pager attr is "o=1"

I would like once you finish with o=1, follow with o=2 to my variable defined $last=556 is equal http://www.onedomain.com/plus?ca=11_c&o=556

Could you help me? What is the best way to do it?

Thanks


Solution

  • Use a for (or while) loop. I don't see $last in your provided code so I've statically set the max value plus one.

    $html = new DOMDocument();
    for($i =1; $i < 557; $i++) {
        @$html->loadHtmlFile('http://www.onedomain.com/plus?ca=11_c&o=' . $i);
        $xpath = new DOMXPath( $html );
        $nodelist = $xpath->query( "//div[@class='link_row']/a[@class='listing_container']/@name" );
        foreach ($nodelist as $n){
            echo $n->nodeValue."\n<br>";
        }
    }
    

    Simpler example:

    for($i =1; $i < 557; $i++) {
        echo $i;
    }
    

    http://php.net/manual/en/control-structures.for.php