I'm trying to scrape a website using curl and php. Now i have to login, that isn't the problem.
I login using cookies and than navigate to a list with products. These products are just printed with php on their site. So not with javascript.
But when i use curl it says the brand/search couldn't be found(No Results Returned). I changed the referrer and host.
How could they detect this and is there a possible way to 'bypass' it? I got a csv file with their products(got it from them) but not with ammount and price and description. So i want to fill that part in myself.
Here is my script:
include('brands.php');
$request = array(
'username'=>'******',
'pass'=>'*********',
'submit'=>'',
'part-submit'=>'',
'referlink'=>'',
'remember'=>1
);
$agent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.website.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt ($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($request));
$response = curl_exec($ch);
curl_close ($ch);
$ch = curl_init();
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_URL, $array[$_GET['k']]."&rpp=100");
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: website.com'));
curl_setopt ($ch, CURLOPT_REFERER, "http://www.website.com/linecard.php");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
//curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt ($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$response = curl_exec($ch);
var_dump(curl_getinfo($ch, CURLINFO_HEADER_OUT ));
$curl_errno = curl_errno($ch);
$curl_error = curl_error($ch);
curl_close($ch);
if ($curl_errno > 0) {
die("cURL Error ($curl_errno): $curl_error\n");
}
echo $response;
Thanks in advance!
ps. removed the offocial website, will provide when needed. for their security and google hits
the answer is actually fairly simple..
My connection to the site is closed every curl request.
I don't want that, so the solution to this problem is removing the curl_close ($ch);
after i login into the website.
Than everything works fine!