Search code examples
phpgoogle-search

"Check position in Google" script works with "google.com" but not "google.pl" - is my server banned?


I have written a PHP script based on a piece of code I've found using Google. It's purpose is to check particular site's position in Google, given a particular keyword. Firstly, it prepares an appropriate URL to query Google (something like: "http://www.google.com/search?q=the+keyword&ie=utf-8&oe=utf-8&num=50"), then it downloads the source of a site located at the URL prepared before. After that, it counts the position using regular expressions and the knowledge about what div's classes does Google use for results.

The script works fine when the URL I want to download from is in the domain "google.com". But since I it's intended to check position for polish people, I would like it to use "google.pl". I wouldn't care, but the search results can really vary between the two (even more than 100 positions of difference). Unfortunately, when I try to use the "pl" domain, the cURL just doesnt't return anything (it waits for the timeout first). However, when I ran my script on another server, it worked perfectly on both of "google.com" and "google.pl" domains. Do you have an idea why can something like this happen? Is there a possibility that my server was banned from querying the "google.pl" domain?

Here, my cURL code:

private function cURL($url)
{
    $ch = curl_init($url);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
    curl_setopt($ch,CURLOPT_CONNECTTIMEOUT,5);
    return curl_exec($ch);
    curl_close($ch);  
}

Solution

  • First of all, I cannot reproduce your problem. I used the following 3 cURL commands to simulate your situation:

    curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/534.51.5 (KHTML, like Gecko) Version/5.1 Safari/534.51.3" http://www.google.com/search?q=the+keyword
    curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/534.51.5 (KHTML, like Gecko) Version/5.1 Safari/534.51.3" http://www.google.pl/search?q=the+keyword
    curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/534.51.5 (KHTML, like Gecko) Version/5.1 Safari/534.51.3" http://www.google.nl/search?q=the+keyword
    

    The first one is .com, because this should work as your reference point. Positive.
    The second one is .pl, because this is where you are encountering problems with. This also just works for me.
    The third one is .nl, because this is where I live (so basically what's .pl for you). This too just works for me.


    I'm not sure, but this could be one possible explanation:

    • Google.com is international, when I enter something at google.nl for example, I still go to google.com/search?q=... (the only difference is the additional lang-param).
    • Since google.nl/search?q=... redirects to google.com (302). Its actual body is empty.
    • I don't know, but it is possible cURL isn't able to handle redirects, or you need to set an additional flag.

    If this is true (which I'll check now), you need to use google.com as domain and add an additional lang-param, instead of using google.pl.

    The reason your other server does the trick, can be because cURL's configuration varies, or the cURL version isn't the same.


    Also, it's blocking cURL's default user-agent string, so I'ld also suggest you to change it into something like:

    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/534.51.5 (KHTML, like Gecko) Version/5.1 Safari/534.51.3
    

    This has nothing to do with the problems you're encountering, but you don't actually close your cURL socket, since you return before you close it (everything after return ... will be 'skipped').