Search code examples
phpexecution-time

How do I fix max execution time errors in PHP?


I have a PHP page that I run every minute through a CRON job.

I have been running it for quite some time but suddenly it started throwing up these errors:

Maximum execution time of 30 seconds exceeded in /home2/sharingi/public_html/scrape/functions.php on line 84

The line number will vary with each error, ranging from line 70 up into the 90s.

Here is the code from lines 0-95

function crawl_page( $base_url, $target_url, $userAgent, $links)
{
    $ch = curl_init();

    curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
    curl_setopt($ch, CURLOPT_URL,$target_url);
    curl_setopt($ch, CURLOPT_FAILONERROR, false);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
    curl_setopt($ch, CURLOPT_AUTOREFERER, true);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
    curl_setopt($ch, CURLOPT_TIMEOUT, 100);
    curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops

    $html = curl_exec($ch);

    if (!$html) 
    {
        echo "<br />cURL error number:" .curl_errno($ch);
        echo "<br />cURL error:" . curl_error($ch);
        //exit;
    }

    //
    // load scrapped data into the DOM
    //

    $dom = new DOMDocument();
    @$dom->loadHTML($html);

    //
    // get only LINKS from the DOM with XPath
    //

    $xpath = new DOMXPath($dom);
    $hrefs = $xpath->evaluate("/html/body//a");

    //
    // go through all the links and store to db or whatever
    //  

    for ($i = 0; $i < $hrefs->length; $i++) 
    {
        $href = $hrefs->item($i);
        $url = $href->getAttribute('href');

        //if the $url does not contain the web site base address: http://www.thesite.com/ then add it onto the front

        $clean_link = clean_url( $base_url, $url, $target_url);
        $clean_link = str_replace( "http://" , "" , $clean_link);
        $clean_link = str_replace( "//" , "/" , $clean_link);

        $links[] = $clean_link;

        //removes empty array values

        foreach($links as $key => $value) 
        { 
            if($value == "") 
            { 
                unset($links[$key]); 
            } 
        } 
        $links = array_values($links); 

        //removes javascript lines

        foreach ($links as $key => $value)
        {
            if ( strpos( $value , "javascript:") !== FALSE )
            {
                unset($links[$key]);
            }
        }
        $links = array_values($links);

        // removes @ lines (email)

        foreach ($links as $key => $value)
        {
            if ( strpos( $value , "@") !== FALSE || strpos( $value, 'mailto:') !== FALSE)
            {
                unset($links[$key]);
            }
        }
        $links = array_values($links);
    }   

    return $links; 
}

What is causing these errors, and how can I prevent them?


Solution

  • You should set the max_execution time using the set_time_limit function. If you want infinite time (most likely your case), use:

    set_time_limit(0);