I'm trying to get a simple string with the description of the image I searched with search-by-image. So I set up my search_by_google.php page:
<?php
$url = $_REQUEST['url'];
if(empty($_REQUEST['raw'])){
$raw = false;
}
else{
$raw = true;
}
echo fetch_google($url, $raw);
function fetch_google($u, $raw, $terms="sample search",$numpages=1,$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0')
{
$ch = curl_init();
$url = 'http://www.google.com/imghp?hl=en&tab=wi';
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, TRUE);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($ch, CURLOPT_VERBOSE,true);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,120);
curl_setopt ($ch, CURLOPT_TIMEOUT,120);
curl_setopt ($ch, CURLOPT_MAXREDIRS,10);
curl_setopt ($ch, CURLOPT_COOKIEFILE,"./cookie.txt");
curl_setopt ($ch, CURLOPT_COOKIEJAR,"./cookie.txt");
curl_setopt ($ch, CURLOPT_VERBOSE,true);
curl_exec($ch);
$searched="";
for($i=0;$i<=$numpages;$i++)
{
$ch = curl_init();
$url="http://www.google.com/searchbyimage?hl=en&image_url=".urlencode($u);
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, TRUE);
curl_setopt ($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($ch, CURLOPT_VERBOSE,true);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/imghp?hl=en&tab=wi');
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT,120);
curl_setopt ($ch, CURLOPT_TIMEOUT,120);
curl_setopt ($ch, CURLOPT_MAXREDIRS,10);
curl_setopt ($ch, CURLOPT_COOKIEFILE,"cookie.txt");
curl_setopt ($ch, CURLOPT_COOKIEJAR,"cookie.txt");
$searched=$searched.curl_exec ($ch);
curl_close ($ch);
}
if($raw){
return $searched;
}
else{
$matches = array();
preg_match('/Best guess for this image:[^<]+<a[^>]+>([^<]+)/', $searched, $matches);
return (count($matches) > 1 ? $matches[1] : false);
}
}
?>
I've changed all the curl options but if I go to http://www.mysite.altervista.org/search_by_google.php?url=http://www.mysite.org/asdasd.jpg&raw=false
It keep me saying 302 Moved
I have changed my code putting
curl_setopt ($ch, CURLOPT_HEADER, TRUE);
in the second curl_init() and now it gives me this message:
EDIT 25/03/2014 19:34
I changed my code like Sabuj Hassan said and the log now is:
HTTP/1.0 302 Found Cache-Control: public, max-age=21600 Date: Tue, 25 Mar 2014 18:30:07 GMT Age: 16 Location: http://www.google.com/search?tbs=sbi:AMhZZisAo2ZcfY19aFUJcEj26M4zKc9ZuxzfsUPzLuUJk-pd-siPwiplqIcGN5tW1XPU16-XFg1EoK7jc5IU3BKoEHYnwZo7RmuhyF5p9qaZwSgq4FKRkNW44JgzTi4Mr8g6ezNMQ6YzaAEQ-uFbPMNzY40NrE3uB7ePm4BGNowF34PiIjLOiVLkWwQ7sRoBVMoVgzBbAP7rDwHee5LyGF8Dq6QOT1TEhsURduPD6exzITyRl77agELdpTFSi-JXDncI6c4KdcuQYSx2LknnIW6nippmpPf3X5OYGn1CFZw13rlFPitLSY0Ang0COuSXKdpBy6B8Dak9QZNZ9VFB4HBRfnMFiyuBvQtyhAg2LeOnRbjnunGB0P1RlwKBF4hRId7wUdTu4Dfab5DQu9hGauLKcd7GcP4g-jQXx_1gymwDdZnPXLzZp1mkjVMX9GFSppj-IRWp3FVVqChsPEzKXdraevuWJFukjUdF87dU_1kLKO23lC8L3kusy05zcq7ZxyF1dHNfQ0vYJeWumtbRosJNuEcqiSyVW_1-bF104HMJLdCA0gr5VyIZolkcZok4W1sgjFYTWvfj6f0proaGE24HSO4Ov2hmhAy9HQUCr3e-KjgqyP4AOtlmI3VsuLu34jKSo0t4tWbb5PVBi1_1oebuv4oisdVdw22a6CRH2tiw8wg6Ya1VgxsXhyj8U7lrQ8cBHVDKlOI6EimXtnELBHyDNQT1Zpsz1hK10GYvFaRNMFd7Rqmg87CLdycgyRV-sYxNWxIu9agNgHTwuU1W-GgeWWcM9noeMwgqMKSGh9lt_1hda3ZWrcA4Y1MeiG55b4ZYvOjcm9t9iIy6LA2S4AjC2X1qZHvJtSqzgfOz8yTuX5jUHqCl0jI1FdOSmqZV1GqQ0uaJfsuchlsWUULfUJBzFiGkAuOqIzU0bpXLNqLHoYPJUPwr66H6jWPFLsWAS9_1GRNj70s30jfbzcS0NUShUvE2meUhlpx-f5M0nmS0zvf-3OQOUkXlYO2VUZ4x9y8G76hHoTkDxqzhhGrgohyFmkUvAWmSkHTBpbP6gek8cyrmBnXuedSV3r2O71G8CUbdHFxfIO8FWlkGj1cUYu60PoKF6hndjZsOlV-dSNXfOTKeC1jPtf5ycXA0s0xLK7_1K0iWxhfmVq62WgQ4O3Prc4b6bcJm8M1Q9xZhhsElisuUyVTN9-dDMNUZ1h0tUe9oGsZYLh9vjEsMokqBXFM_1igHOfgRn4I17Xt8EBMZI9cEjakByjv-g5Pt9tG69RQm765HLhf8VpafvE5Z3BwDpZs4x5uMkVDURT9qcA&hl=en Server: quimby_frontend Content-Length: 1566 Content-Type: text/html; charset=UTF-8 Expires: Wed, 26 Mar 2014 00:30:07 GMT Alternate-Protocol: 80:quic X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block
302 Moved
The document has moved here. HTTP/1.0 302 Found Cache-Control: public, max-age=21600 Date: Tue, 25 Mar 2014 18:30:07 GMT Age: 16 Location: http://www.google.com/search?tbs=sbi:AMhZZisAo2ZcfY19aFUJcEj26M4zKc9ZuxzfsUPzLuUJk-pd-siPwiplqIcGN5tW1XPU16-XFg1EoK7jc5IU3BKoEHYnwZo7RmuhyF5p9qaZwSgq4FKRkNW44JgzTi4Mr8g6ezNMQ6YzaAEQ-uFbPMNzY40NrE3uB7ePm4BGNowF34PiIjLOiVLkWwQ7sRoBVMoVgzBbAP7rDwHee5LyGF8Dq6QOT1TEhsURduPD6exzITyRl77agELdpTFSi-JXDncI6c4KdcuQYSx2LknnIW6nippmpPf3X5OYGn1CFZw13rlFPitLSY0Ang0COuSXKdpBy6B8Dak9QZNZ9VFB4HBRfnMFiyuBvQtyhAg2LeOnRbjnunGB0P1RlwKBF4hRId7wUdTu4Dfab5DQu9hGauLKcd7GcP4g-jQXx_1gymwDdZnPXLzZp1mkjVMX9GFSppj-IRWp3FVVqChsPEzKXdraevuWJFukjUdF87dU_1kLKO23lC8L3kusy05zcq7ZxyF1dHNfQ0vYJeWumtbRosJNuEcqiSyVW_1-bF104HMJLdCA0gr5VyIZolkcZok4W1sgjFYTWvfj6f0proaGE24HSO4Ov2hmhAy9HQUCr3e-KjgqyP4AOtlmI3VsuLu34jKSo0t4tWbb5PVBi1_1oebuv4oisdVdw22a6CRH2tiw8wg6Ya1VgxsXhyj8U7lrQ8cBHVDKlOI6EimXtnELBHyDNQT1Zpsz1hK10GYvFaRNMFd7Rqmg87CLdycgyRV-sYxNWxIu9agNgHTwuU1W-GgeWWcM9noeMwgqMKSGh9lt_1hda3ZWrcA4Y1MeiG55b4ZYvOjcm9t9iIy6LA2S4AjC2X1qZHvJtSqzgfOz8yTuX5jUHqCl0jI1FdOSmqZV1GqQ0uaJfsuchlsWUULfUJBzFiGkAuOqIzU0bpXLNqLHoYPJUPwr66H6jWPFLsWAS9_1GRNj70s30jfbzcS0NUShUvE2meUhlpx-f5M0nmS0zvf-3OQOUkXlYO2VUZ4x9y8G76hHoTkDxqzhhGrgohyFmkUvAWmSkHTBpbP6gek8cyrmBnXuedSV3r2O71G8CUbdHFxfIO8FWlkGj1cUYu60PoKF6hndjZsOlV-dSNXfOTKeC1jPtf5ycXA0s0xLK7_1K0iWxhfmVq62WgQ4O3Prc4b6bcJm8M1Q9xZhhsElisuUyVTN9-dDMNUZ1h0tUe9oGsZYLh9vjEsMokqBXFM_1igHOfgRn4I17Xt8EBMZI9cEjakByjv-g5Pt9tG69RQm765HLhf8VpafvE5Z3BwDpZs4x5uMkVDURT9qcA&hl=en Server: quimby_frontend Content-Length: 1566 Content-Type: text/html; charset=UTF-8 Expires: Wed, 26 Mar 2014 00:30:07 GMT Alternate-Protocol: 80:quic X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block
302 Moved
The document has moved here.
It can happen that following redirection is blocked for your curl at your server. So I'll recommend you to handle the redirection manually. Like this one:
First your curl function. You can add other curl options if you like:
function curl($url, $user_agent, $retry=0){
if($retry > 5){
print "Maximum 5 retries are done, skipping!\n";
return "in loop!";
}
$ch = curl_init();
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_USERAGENT, $user_agent);
curl_setopt ($ch, CURLOPT_HEADER, TRUE);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt ($ch, CURLOPT_REFERER, 'http://www.google.com/');
curl_setopt ($ch, CURLOPT_COOKIEFILE,"./cookie.txt");
curl_setopt ($ch, CURLOPT_COOKIEJAR,"./cookie.txt");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$result = curl_exec($ch);
curl_close($ch);
// handling the follow redirect
if(preg_match("|Location: (https?://\S+)|", $result, $m)){
print "Manually doing follow redirect!\n$m[1]\n";
return curl($m[1], $user_agent, $retry + 1);
}
// add another condition here if the location is like Location: /home/products/index.php
return $result;
}
And here is how it should be called:
$response = curl("http://www.google.com/", "Mozilla 5.0");
print "$response\n";
I am parsing the follow link from the Location:
header. It can happen that the link is not started with http://
That case add another condition over there.