Search code examples
phpseosearch-engineuser-agent

Detecting a crawl (Search Engine's visit) using PHP


When a search engine visits a webpage, what does get_browser() function and $_SERVER['HTTP_USER_AGENT'] return?

Also, what is the other possible evidence that PHP offers when a search engine crawls a webpage?


Solution

    • The get_browser() function attempts to determine the browser's features (in array) but dont count too much on it because of the non standard user-agents; instead, for a serious app, build your own.

    • the $_SERVER["HTTP_USER_AGENT"] is a long string "describing" the user's browser and can be used as first parameter in the above function (optional); A tip: use this one to uncover user's browser instead of get_browser() itself! Also be prepared for a missing user agent as well! An example of this string is this:
      Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/418 (KHTML, like Gecko) Safari/417.9.3

    • a search engine or robot or spider or crawler that follows the rules will visit your page according to the information stored of robots.txt that must exist in your site's root. Without a robots.txt a spider can crawl the whole site, as long as it find links inside your pages; if you have this file you can program it so to tell the spider what to search; NOTE: this rule applies only to "good" spiders and not the bad ones