Search code examples
phpweb-scrapingcurlcookies

curl scrape page, not showing search entries (no js/cookie issue)


I'm trying to scrape a website using curl and php. Now i have to login, that isn't the problem.

I login using cookies and than navigate to a list with products. These products are just printed with php on their site. So not with javascript.

But when i use curl it says the brand/search couldn't be found(No Results Returned). I changed the referrer and host.

How could they detect this and is there a possible way to 'bypass' it? I got a csv file with their products(got it from them) but not with ammount and price and description. So i want to fill that part in myself.

Here is my script:

include('brands.php');

$request = array(
    'username'=>'******',
    'pass'=>'*********',
    'submit'=>'',
    'part-submit'=>'',
    'referlink'=>'',
    'remember'=>1
);
$agent = "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.website.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt ($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, http_build_query($request));
$response = curl_exec($ch);
curl_close ($ch);

$ch = curl_init();
curl_setopt($ch, CURLOPT_VERBOSE, TRUE);
curl_setopt($ch, CURLOPT_URL, $array[$_GET['k']]."&rpp=100");
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Host: website.com'));
curl_setopt ($ch, CURLOPT_REFERER, "http://www.website.com/linecard.php");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
//curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt ($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt ($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
$response = curl_exec($ch);
var_dump(curl_getinfo($ch, CURLINFO_HEADER_OUT ));
$curl_errno = curl_errno($ch);
$curl_error = curl_error($ch);
curl_close($ch);

if ($curl_errno > 0) {
    die("cURL Error ($curl_errno): $curl_error\n");
}

echo $response;

Thanks in advance!

ps. removed the offocial website, will provide when needed. for their security and google hits


Solution

  • the answer is actually fairly simple.. My connection to the site is closed every curl request. I don't want that, so the solution to this problem is removing the curl_close ($ch); after i login into the website.

    Than everything works fine!