Search code examples
urlrobots.txtgooglebot

EHow to Disallow few list of URL crawled by google crawler using robots.txt


I have couple of pages and URL which I do not want to be crawled by Google crawler.

I know it can be done via robots.txt. I search Google and found this way we need to arrange the whole things in robots.txt for disallow crawler but I am not sure does it right or not.

User-Agent: *
Disallow: /music?
Disallow: /widgets/radio?

Disallow: /affiliate/
Disallow: /affiliate_redirect.php
Disallow: /affiliate_sendto.php
Disallow: /affiliatelink.php
Disallow: /campaignlink.php
Disallow: /delivery.php

Disallow: /music/+noredirect/
Disallow: /user/*/library/music/
Disallow: /*/+news/*/visit
Disallow: /*/+wiki/diff

# AJAX content
Disallow: /search/autocomplete
Disallow: /template
Disallow: /ajax
Disallow: /user/*/tasteomatic

Can I give the URL like this way? I mean, can I specify full URL as disallow?

Disallow: http://www.bba-reman.com/admin/feedback.htm

EDIT

my current robots.txt entries looks like below

User-Agent: *
Disallow: /CheckLogin
Disallow: /DTC.pdf
Disallow: /catalogue/bmw.htm
Disallow: /auto-mine/bmw/index.htm
Disallow: /forums/parent.Jmp('i100')
Disallow: /forums/parent.Jmp('i040')
Disallow: /forums/CodeDescriptions.html
Disallow: /forums/parent.Jmp('i050')
Disallow: /forums/parent.Scl('000','24601')
Disallow: /forums/parent.Jmp('i030')
Disallow: /catalogue/peugeot.htm

is it ok.....just tell me. thanks


Solution

  • The value of the Disallow field is always the beginning of the URL path.

    So if your robots.txt is accessible from http://example.com/robots.txt, and it contains this line

    Disallow: http://example.com/admin/feedback.htm
    

    then URLs like these would be disallowed:

    • http://example.com/http://example.com/admin/feedback.htm
    • http://example.com/http://example.com/admin/feedback.html
    • http://example.com/http://example.com/admin/feedback.htm_foo
    • http://example.com/http://example.com/admin/feedback.htm/bar

    So if you want to disallow the URL http://example.com/admin/feedback.htm, you have to use

    Disallow: /admin/feedback.htm
    

    which would block URLs like these:

    • http://example.com/admin/feedback.htm
    • http://example.com/admin/feedback.html
    • http://example.com/admin/feedback.htm_foo
    • http://example.com/admin/feedback.htm/bar