Search code examples
robots.txtgithub-pages

Stopping index of Github pages


I have a github page from my repository username.github.io

However I do not want Google to crawl my website and absolutely do not want it to show up on search results.

Will just using robots.txt in github pages work? I know there are tutorials for stop indexing Github repository but what about the actual Github page?


Solution

  • Will just using robots.txt in github pages work?

    If you're using the default GitHub Pages subdomain, then no because Google would check https://github.io/robots.txt only.

    You can make sure you don't have a master branch, or that your GitHub repo is a private one, although, as commented by olavimmanuel and detailed in olavimmanuel's answer, this would not change anything.

    However, if you're using a custom domain with your GitHub Pages site, you can place a robots.txt file at the root of your repo and it will work as expected. One example of using this pattern is the repo for Bootstrap.

    However, bmaupin points out, from Google's own documentation:

    A robots.txt file tells search engine crawlers which URLs the crawler can access on your site.

    This is used mainly to avoid overloading your site with requests; it is not a mechanism for keeping a web page out of Google.

    To keep a web page out of Google, block indexing with noindex or password-protect the page."