Search code examples
phpwebwebservercloaking

What is mechanism behind cloaking?


According to this "Cloaking refers to the practice of presenting different content or URLs to human users and search engines" The same link gives example as:

Serving a page of HTML text to search engines, while showing a page of images or Flash to users

Question: If I had correctly interpreted, there must be a mechanism for identification of the entity whether it is search engine or browser(user) at the web server.What do we call such mechanism? Or is it just a php or JavaScript code that redirects? How does a web-server actually know that entity'X' search engine and entity'Y' is web browser?


Solution

  • The user agent is a good way to identify the client.

    This is the user agent string passed to the server on a request from a browser:

    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"

    From Google:

    Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

    From Bing:

    Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)

    To use them with PHP, you may do something like this:

    if (strpos($_SERVER['HTTP_USER_AGENT'],'bot') !== false) {
        // This is probably a bot
    }
    

    If you want to be a little more precise, you might want to also check for a link, like so:

    $userAgent = $_SERVER['HTTP_USER_AGENT'];
    if (strpos($userAgent,'bot') !== false && strpos($userAgent,'http') !== false) {
        // It is probably a bot
    }
    

    This question and answers show how to use Apache to deliver different content based on user agent: Rewrite rule for user agent with mod_rewrite