Search code examples
performancegooglebot

How to prevent Googlebot from overwhelming site?


I'm running a site with a lot of content, but little traffic, on a middle-of-the-road dedicated server.

Occasionally, Googlebot will stampede us, resulting in Apache maxing out its memory, and causing the server to crash.

How can I avoid this?


Solution

    • register at google webmaster tools, verify your site and throttle google bot down
    • submit a sitemap
    • read the google guildelines: (if-Modified-Since HTTP header)
    • use robot.txt to restrict access from to bot to some parts of the website
    • make a script that changes the robot.txt each $[period of time] to make sure the bot is never able to crawl too many pages at the same time while making sure it can crawl all the content overall