I have a PHP page that I run every minute through a CRON job.
I have been running it for quite some time but suddenly it started throwing up these errors:
Maximum execution time of 30 seconds exceeded in /home2/sharingi/public_html/scrape/functions.php on line 84
The line number will vary with each error, ranging from line 70 up into the 90s.
Here is the code from lines 0-95
function crawl_page( $base_url, $target_url, $userAgent, $links)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 100);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10); //follow up to 10 redirections - avoids loops
$html = curl_exec($ch);
if (!$html)
{
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
//exit;
}
//
// load scrapped data into the DOM
//
$dom = new DOMDocument();
@$dom->loadHTML($html);
//
// get only LINKS from the DOM with XPath
//
$xpath = new DOMXPath($dom);
$hrefs = $xpath->evaluate("/html/body//a");
//
// go through all the links and store to db or whatever
//
for ($i = 0; $i < $hrefs->length; $i++)
{
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
//if the $url does not contain the web site base address: http://www.thesite.com/ then add it onto the front
$clean_link = clean_url( $base_url, $url, $target_url);
$clean_link = str_replace( "http://" , "" , $clean_link);
$clean_link = str_replace( "//" , "/" , $clean_link);
$links[] = $clean_link;
//removes empty array values
foreach($links as $key => $value)
{
if($value == "")
{
unset($links[$key]);
}
}
$links = array_values($links);
//removes javascript lines
foreach ($links as $key => $value)
{
if ( strpos( $value , "javascript:") !== FALSE )
{
unset($links[$key]);
}
}
$links = array_values($links);
// removes @ lines (email)
foreach ($links as $key => $value)
{
if ( strpos( $value , "@") !== FALSE || strpos( $value, 'mailto:') !== FALSE)
{
unset($links[$key]);
}
}
$links = array_values($links);
}
return $links;
}
What is causing these errors, and how can I prevent them?
You should set the max_execution time using the set_time_limit function. If you want infinite time (most likely your case), use:
set_time_limit(0);