I have a server with multiple websites, of which I want to block only one. I know that robots.txt accepts the following:
User-agent: *
Disallow: /
To block bots from crawling the site, but there is ambiguous language in the articles I read. Some say this will block the site, some say the server.
If this is in the root directory of the site, will it block the site only? Is there some better practice for doing this?
A given robots.txt file only controls crawling of pages on the domain and subdomain it was requested from. The crawler does not know and does not care if different domains are hosted on the same physical server. They are still different domains. The file http://aaa.com/robots.txt applies only to pages on http://aaa.com/, and http://bbb.com/robots.txt applies only to pages on http://bbb.com/. They can be hosted on the same physical machine, or on different servers on opposite sides of the world.