I am using WordPress. One of the files functions.php
contains function do_robots() {...
which blocks Google crawling. I have replaced this function with the following:
function do_robots() {
header( 'Content-Type: text/plain; charset=utf-8' );
do_action( 'do_robotstxt' );
if ( '0' == get_option( 'blog_public' ) ) {
echo "User-agent: *";
echo "\nDisallow: /wp-admin";
echo "\nDisallow: /wp-includes";
echo "\nDisallow: /wp-content";
echo "\nDisallow: /stylesheets";
echo "\nDisallow: /_db_backups";
echo "\nDisallow: /cgi";
echo "\nDisallow: /store";
echo "\nDisallow: /wp-includes\n";
} else {
echo "User-agent: *";
echo "\nDisallow: /wp-admin";
echo "\nDisallow: /wp-includes";
echo "\nDisallow: /wp-content";
echo "\nDisallow: /stylesheets";
echo "\nDisallow: /_db_backups";
echo "\nDisallow: /cgi";
echo "\nDisallow: /store";
echo "\nDisallow: /wp-includes\n";
}
}
Allow
. Is it that as long as I do not Disallow
, it is Allow
by default? function
?The original function out of SVN looks like it's blocking fewer paths than your example above, so I would recommend removing some of the extra directories (e.g. wp-content) and seeing if that is what you're looking for. You could also try the WordPress plugin to generate a Google Sitemap for their engine to read.
function do_robots() {
header( 'Content-Type: text/plain; charset=utf-8' );
do_action( 'do_robotstxt' );
$output = "User-agent: *\n";
$public = get_option( 'blog_public' );
if ( '0' == $public ) {
$output .= "Disallow: /\n";
} else {
$site_url = parse_url( site_url() );
$path = ( !empty( $site_url['path'] ) ) ? $site_url['path'] : '';
$output .= "Disallow: $path/wp-admin/\n";
$output .= "Disallow: $path/wp-includes/\n";
}
echo apply_filters('robots_txt', $output, $public);
}
The rule for robots.txt
files is that everything is allowed unless specified, though a search engine obeying robots.txt
is more of a trust system than anything.