I've google a lot and read a lot of articles, but got mixed reactions.
I'm a little confused about which is a better option if I want a certain section of my site to be blocked from being indexed by Search Engines. Basically I make a lot of updates to my site and also design for clients, I don't want all the "test data" that I upload for previews to be indexed to avoid the duplicate content issue.
Should I use a sub-domain and block the whole sub-domain
or
Create a sub-directory and block it using robots.txt
.
I'm new to web-designing and was a little insecure about using sub-domains (read somewhere that it's a little advanced procedure and even a tiny mistake could have big consequences, moreover Matt Cutts has also mentioned something similar (source):
"I’d recommend using sub directories until you start to feel pretty confident with the architecture of your site. At that point, you’ll be better equipped to make the right decision for your own site."
But on the other hand I'm hesitant on using robots.txt
as well as anyone could access the file.
What are the pros and cons of both?
For now I am under the impression that Google treats both similarly and it would be best to go for a sub-directory with robots.txt
, but I'd like a second opinion before "taking the plunge".
Either you ask bots not to index your content (→ robots.txt) or you lock everyone out (→ password protection).
For this decision it's not relevant whether you use a separate subdomain or a folder. You can use robots.txt or password protection for both. Note that the robots.txt always has to be put in the document root.
Using robots.txt gives no guaranty, it's only a polite request. Polite bots will honor it, others not. Human users will still be able to visit your "disallowed" pages. Even those bots that honor your robots.txt (e.g. Google) may still link to your "disallowed" content in their search (they won't index content, though).
Using a login mechanism protects your pages from all bots and visitors.