Search code examples
phpstrpos

strpos problem: getting value UBLIC returned


I am making a class to open a webpage and store the href values of all outbound links on the page. For some reason it works for the first 3 then goes wierd. Below is my code:

class Crawler {
var $url;

function construct($url) {
    $this->url = 'http://'.$url;
    $this->crawl();
}

function crawl() {
    $str = file_get_contents($this->url);
    $start = 0;
    for($i=0; $i<10; $i++) {
        $beg = strpos($str, '<a href="http://',$start)+16;
        $end = strpos($str,'"',$beg);
        $diff = $end - $beg;
        $links[$i] = substr($str,$beg, $diff);
        $start = $start + $beg;
    }
    print_r($links);
}
}

$crawler = new Crawler;
$crawler->construct('www.yahoo.com');

Ignore the for loop for the time being I know this will only return the first 10 and won't do the whole document. But if you run this code the first 3 work fine but then all the other values are UBLIC. Can anyone help? Thanks


Solution

  • Instead of:

    $start = $start + $beg;
    

    try:

    $start = $beg;
    

    That's likely why you are only seeing the first three matches.

    Also, you need to insert a check that $beg is not FALSE:

    for($i=0; $i<10; $i++) {
        $beg = strpos($str, '<a href="http://',$start)+16;
        if ($beg === FALSE)
            break;
        //...
    

    Note, however, that you really should be using DOMDocument to find all tags in a document with a given tag name (a here). In particular, because this is HTML that might not be valid XHTML, you should consider using the loadHTML method.