Search code examples
phprobots.txtforumphpbb

PHPBB Robots.txt Rules?


I am currently editing my Robots.txt which looks like this:

User-agent: *
Disallow: /adm/*
Disallow: /download/*
Disallow: /cache
Disallow: /files
Disallow: /viewforum.php?f=146
Disallow: /ucp.php
Disallow: /mcp.php
Disallow: /memberlist.php
Disallow: /config.php
Disallow: /cron.php
Disallow: /faq.php
Disallow: /report.php
Sitemap: http://www.website.com/app.php/sitemap.xml

I am wondering how to correctly do a few things however.

1) Would this correctly block search engines from accessing a forum area?

Disallow: /viewforum.php?f=146

I wanted one area hidden from search engines but the rest of the forum areas fully readable as normal.

2) How do you block access to the internal PHPBB folders and keep search engines out out admin? are these rules correct?

Disallow: /adm/*
Disallow: /download/*

3) Do the rules for php files work correctly?

Disallow: /ucp.php

Also is there anything else i should know or do?


Solution

  • The line

    Disallow: /viewforum.php?f=146
    

    disallows crawling of URLs whose paths start with /viewforum.php?f=146.

    So URLs like these would not allowed to be crawled:

    • http://example.com/viewforum.php?f=146
    • http://example.com/viewforum.php?f=1461
    • http://example.com/viewforum.php?f=146a
    • http://example.com/viewforum.php?f=146/foo
    • http://example.com/viewforum.php?f=146&bar

    (It works the same for /ucp.php, /adm/, and /download/, of course. Note that this means that the appeneded * is not needed, unless it’s actually part of the URL.)

    So if the forum overview is at http://example.com/viewforum.php?f=146, it will be blocked. However, note that it might be the case that the same page is accessible from a different URL in addition, e.g. something like: http://example.com/viewforum.php?someOtherParameter&f=146

    Also note that this will not necessarily block crawling of forum threads in that forum area (because they typically don’t start with this path). While conforming bots won’t crawl this forum area page, they might find links to the threads from some other place.