Search code examples
phpgoogle-crawlers

Ignoring crawlers as visitors to website


When each page loads on my website I use the same php code snipit to add them to a mysql database as new visitors or update the database entry if they already visited. I use cookies to check if the visitor is new or old. However I use the code below to check if its a crawler and not a human before executing my code snipit.

However this does not work. I still get database entries from googlebot and Facebook (so always returns False?). Could someone tell me what I'm doing wrong?

function getIsCrawler() {
    $agents = array(
        "Google",
        "google", 
        "facebook", 
        "Facebook", 
        "Bing", 
        "bing",
        "yahoo",
        "Yahoo",
        "Twitter",
        "twitter",
        "Instagram",
        "instagram"
    );
    foreach ($agents as $agent)   
    {
        if(strpos($_SERVER['HTTP_USER_AGENT'], $agent))
        {
            return True;
        }
    }
    return False;
}

$iscrawler = getIsCrawler();

if ($isCrawler == False) 
{
    //run php code snipit to handle visitors
}

Solution

  • There are a couple of things that you'll probably want to look at. First, you can make this function easier to test by passing values to it. Passing values removes the tight coupling between this function and web pages.

    Second, strpos() and its ilk are notorious for their return values. There are big red (ish) warnings in the docs.

    Finally, you can reduce the size of the array $agents by using stripos(). It's not sensitive to case.

    function getIsCrawler($external_agent) {
      $agents = array(
                      "Google",
                      "Facebook", 
                      "Bing",
                      "Yahoo",
                      "Twitter",
                      "Instagram",
                      );
      foreach ($agents as $agent)   
        {
          if(stripos($external_agent, $agent) !== False)
            {
              return True;
            }
        }
      return False;
    }