Search code examples
phpgetweb-crawlerrobots.txtgoogle-crawlers

Are Robots.txt and metadata tags enough to stop search engines to index dynamic pages that are dependent of $_GET variables?


I created a php page that is only accessible by means of token/pass received through $_GET

Therefore if you go to the following url you'll get a generic or blank page

http://fakepage11.com/secret_page.php

However if you used the link with the token it shows you special content

http://fakepage11.com/secret_page.php?token=344ee833bde0d8fa008de206606769e4

Of course this is not as safe as a login page, but my only concern is to create a dynamic page that is not indexable and only accessed through the provided link.

Are dynamic pages that are dependent of $_GET variables indexed by google and other search engines?

If so, will include the following be enough to hide it?

  • Robots.txt User-agent: * Disallow: /

  • metadata: <META NAME="ROBOTS" CONTENT="NOINDEX">

Even if I type into google:

site:fakepage11.com/

Thank you!


Solution

  • If a search engine bot finds the link with the token somehow¹, it may crawl and index it.

    If you use robots.txt to disallow crawling the page, conforming search engine bots won’t crawl the page, but they may still index its URL (which then might appear in a site: search).

    If you use meta-robots to disallow indexing the page, conforming search engine bots won’t index the page, but they may still crawl it.

    You can’t have both: If you disallow crawling, conforming bots can never learn that you also disallow indexing, because they are not allowed to visit the page to see your meta-robots element.

    ¹ There are countless ways how search engines might find a link. For example, a user that visits the page might use a browser toolbar that automatically sends all visited URLs to a search engine.