Search code examples
ajaxgooglebotgoogle-crawlers

How does Googlebot know a webserver is not cloaking when it requests a `?_escaped_fragment_=` URL?


With regard to Google's AJAX crawling spec, if the server returns one thing (namely, a JavaScript-heavy file) for a #! URL and something else (namely, a "html snapshot" of the page) to Googlebot when the #! is replaced with ?_escaped_fragment_=, that feels like cloaking to me. After all, how is Googlebot sure that the server is returning good faith equivalents for both the #! and ?_escaped_fragment_= URLs. Yet this is what the AJAX crawling spec actually tells webmasters to do. Am I missing something? How is Googlebot sure that the server is returning the same content in both cases?


Solution

  • The crawler does not know. But it never knows even for sites that return plain ol' html either - it is extremely easy to write code that cloaks the site based on http headers used by crawlers or known IP headers.

    See this related question: How does Google Know you are Cloaking?

    Most of it seems like conjecture, but it seems likely there are various checks in-place, varying between spoofing normal browser headers and actual real-person looking at the page.

    Continuing the conjecture, it certainly wouldn't be beyond the capabilities of programmers at Google to write a form of crawler that actually retrieved what the user sees - after all, they have their own browser that does just that. It would be prohibitively CPU-expensive to do that all the time, but probably makes sense for the occasional spot-check.