Search code examples
phpapache.htaccessmod-rewritegoogle-search

Why is Google indexing Friendly URL mixed with hyphens and %20?


I developed a blog from scratch and things has gone great so far. I finally got around to writing my first post/article, and I've been waiting for Google to index this specific page to make sure there aren't any issue with it. Well, google is currently indexing the same page 4 times, I have (with the help of users from stackoverflow) a mod_rewrite on my htaccess to rewrite all urls to hyphens coming from a specific file (article.php).

My currently article page stands as followed. example: www.site.com/article.php?article_id=10&article_title=friendly url goes over here

with mod_rewrite I have change the urls to the following.

www.site.com/article/id/friendly-url-goes-over-here

but Google seems to be indexing the same page 4 times like so..

www.site.com/article/10/friendly-url-goes-over-here
www.site.com/article/10/friendly-url-goes%20over%20here
www.site.com/article/10/friendly-url%20goes%20over%20here
www.site.com/article/10/friendly%20-url%20goes%20over%20here

Why is it indexing 4 of the same page? It seems to index the pages how many times each hyphen is inserted, so if there were 10 hyphens, I'm guessing Google would index 10 of the same page.. Here is all of my htaccess file.

RewriteEngine on

# add www before hostname
RewriteCond %{HTTP_HOST} ^oddify\.co$ [NC]
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [R=302,L,NE]

# if on article page, get slugs and make into friendly url
RewriteCond %{THE_REQUEST} \s/article\.php\?article_uid=([^&]+)&article_title=([^&\ ]+)
RewriteRule ^ /article/%1/%2/? [L,R=302,NE]

# if page with .php is requested then remove the extension
RewriteCond %{THE_REQUEST} \s/+(.+?)\.php[\s?] [NC]
RewriteRule ^ /%1/ [R=302,L,NE]

RewriteRule "^(article)/([^ ]*) +(.*)$" /$1/$2-$3 [L,R]

# Force a trailing slash to be added
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{THE_REQUEST} \s/+([^.]+?[^/.])[\s?] [NC]
RewriteRule ^ /%1/ [R=302,L]

# allow page direction to change the slugs into friendly seo URL
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f 
RewriteRule (?:^|/)article/([^/]+)/([^/]+)/?$ /webroot/article.php?article_uid=$1&article_title=$2 [L,QSA,NC]

# silently rewrite to webroot
RewriteCond %{REQUEST_URI} !/webroot/ [NC]
RewriteRule ^ /webroot%{REQUEST_URI} [L]

# .php ext hiding
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule ^(.+?)/?$ $1.php [L]

I wrote this question a few days ago, I made sure to de-index the pages from google, then now google has gone ahead and re-indexed them the same way.

Here is the google pages showing the 4 index pages: google search page


Solution

  • Try changing this redirect to a 301:

    RewriteRule "^(article)/([^ ]*) +(.*)$" /$1/$2-$3 [L,R=301]
    

    The 301 status tells google (and browser and other clients) that the redirect is permanent and the old URL (the one with spaces) shouldn't be considered anymore.