Search code examples
robots.txtscraper

robots.txt disallow: spider


I'm looking at a robots.txt file of a site I would like to do a one off scrape and there is this line:

User-agent: spider

Disallow: /

Does this mean they don't want any spiders? I was under the impression that * was used for all spiders. If true this would of-course stop spiders such as google.


Solution

  • This just tells to agents that call themselves spider to be gently enough to not browse the site.

    This has no special meaning.

    robots.txt files are used only by robots, so a way to exclude all robots is to use a *:

    User-Agent: *
    Disallow: /