This question is for an Apache .htaccess file.
Let's say the real url on my website is:
https://abcs.org/about-scholarships
I have incoming spammy links (from other websites) that are of this format (examples):
https://abcs.org/about-scholarships95scholar
https://abcs.org/about-scholarships.happy99
https://abcs.org/about-scholarshipsFTO
https://abcs.org/about-scholarships$25.00
Basically, they are adding a random string onto the end of the correct URL.
I need a method to redirect any of these types of incorrect URLs to the proper URL.
I have been using Redirect 301 reactively to every instance of a new spam URL being caught.
For example:
Redirect 301 /about-scholarships95scholar /about-scholarships
Redirect 301 /about-scholarships.happy99 /about-scholarships
This works just fine, but there are dozens of these being produced every day, for multiple different pages of my site.
I have tried using wildcards after the correct URL string (in Redirect 301, and in RewriteRule) but I can't figure out how to format it correctly.
Thank you for any help!
You can use a RedirectMatch
(mod_alias) directive (that matches against a regex) to check for /about-scholarships<something>
and remove the <something>
part. For example:
RedirectMatch 301 ^(/about-scholarships). $1
Where the .
matches any character (following by anything or nothing). The $1
backreference contains the parenthesized subgroup from the preceding pattern, ie. just the string "/about-scholarships" (which saves repetition).
However, if you are routing your URLs via a front-controller pattern later in the .htaccess
file then you should use mod_rewrite instead for this redirect to avoid conflicts. For example, the following should go at the top of the root .htaccess
file (before any front-controller pattern):
RewriteEngine On
RewriteRule ^(about-scholarships). /$1 [R=301,L]
Note the difference in the slash prefix with the RedirectMatch
directive.
There's no way to further optimise this in .htaccess
. You would need a separate rule for each page of your site - which is not scalable if you have many pages. Ideally you would perform this check in your application logic (front-controller) where you could potentially automate this (although how efficient that would be is another matter).
HOWEVER, if these are just "spammy" links then you should probably just let them 404 instead. (I assume they are otherwise returning a 404?) I can't see that redirecting would really serve any purpose?