Search code examples
web-crawlerrobots.txtgoogle-crawlers

Incomplete robots.txt, what happens?


I have a page on a website, and don't have access to anything other that my page. The website is used to sell various small products. After being with them for over a year, and having using google adwords to help promote myself, I noticed that none of the product's pages were indexed on google. When I noticed that the robots.txt for the site didn't contain much, and wondering if this has anything to do with it.

Product URLs follow this format:

www.example.com/myshopname/[product ID]?q=I[product ID]

And the robots.txt is simply:

Disallow: /*_escaped_fragment_

There's no user-agent. I'm wondering if this would have any effect on Google crawling my page, or if it would simply ignore the robots.txt as no user-agent was specified.


Solution

  • I will give you some more info here:

    The robots.txt file is a simple text file on your web server which tells webcrawlers if they can access a file or not. You can always access this file because is not part of your server system files but is a part of your site.

    In your case I don't know what this /*_escaped_fragment_ means but :

    User-agent: *
    Disallow: /
    

    Will block the access to all the crawlers

    While this:

    User-agent: *
    Disallow:
    

    Allow full access to your website.

    User-agent: *
    Disallow: /images/
    

    Will block access to the specified folder

    User-agent: *
    Disallow: /images
    Allow: /images/my_photo.jpg
    

    Even if you disallow a folder you can always give access to a specified file in that folder.

    User-agent: *
    Disallow: /assets.html
    

    Will block access to the specified file

    So the star means all crawlers if you want apply the directives to a specified crawler you need to do:

    User-agent: Googlebot
    

    If you are specifically interested on googlebot and you want to see if your robot.txt is blocking files or folders on your site just visit the https://developers.google.com/ so you can see if you are blocking page resources.

    Is also necessary to say that while the robot.txt can be an useful tool for your SEO, the directives applied will be respected by all the regular crawlers.

    Malicious crawlers do not care about those directives.