Search code examples
phprobots.txtgooglebot

How to create a Google Bot friendly robot.txt on a WordPress?


I am using WordPress. One of the files functions.php contains function do_robots() {... which blocks Google crawling. I have replaced this function with the following:

function do_robots() {
    header( 'Content-Type: text/plain; charset=utf-8' );

    do_action( 'do_robotstxt' );

    if ( '0' == get_option( 'blog_public' ) ) {
                     echo  "User-agent: *";
                     echo  "\nDisallow: /wp-admin";
                     echo  "\nDisallow: /wp-includes";
                     echo  "\nDisallow: /wp-content";
                     echo  "\nDisallow: /stylesheets";
                     echo  "\nDisallow: /_db_backups";
                     echo  "\nDisallow: /cgi";
                     echo  "\nDisallow: /store";
                     echo  "\nDisallow: /wp-includes\n";
    } else {
                     echo  "User-agent: *";
                     echo  "\nDisallow: /wp-admin";
                     echo  "\nDisallow: /wp-includes";
                     echo  "\nDisallow: /wp-content";
                     echo  "\nDisallow: /stylesheets";
                     echo  "\nDisallow: /_db_backups";
                     echo  "\nDisallow: /cgi";
                     echo  "\nDisallow: /store";
                     echo  "\nDisallow: /wp-includes\n";
    }
}
  1. I am not quite sure about Allow. Is it that as long as I do not Disallow, it is Allow by default?
  2. Why does Google Bot still get blocked by the above function?

Solution

  • The original function out of SVN looks like it's blocking fewer paths than your example above, so I would recommend removing some of the extra directories (e.g. wp-content) and seeing if that is what you're looking for. You could also try the WordPress plugin to generate a Google Sitemap for their engine to read.

    function do_robots() {
        header( 'Content-Type: text/plain; charset=utf-8' );
    
        do_action( 'do_robotstxt' );
    
        $output = "User-agent: *\n";
        $public = get_option( 'blog_public' );
        if ( '0' == $public ) {
            $output .= "Disallow: /\n";
        } else {
            $site_url = parse_url( site_url() );
            $path = ( !empty( $site_url['path'] ) ) ? $site_url['path'] : '';
            $output .= "Disallow: $path/wp-admin/\n";
            $output .= "Disallow: $path/wp-includes/\n";
        }
    
        echo apply_filters('robots_txt', $output, $public);
    }
    

    The rule for robots.txt files is that everything is allowed unless specified, though a search engine obeying robots.txt is more of a trust system than anything.