Search code examples
regexapache.htaccessurlencodehtml-encode

Replacing URL encoding (%20) with + in .htaccess


Here's what I've got so far which works fine for redirecting single keywords, but not so well if the URL contains a URL encoded space (%20).

RewriteCond %{QUERY_STRING} (?:^|&)search=([^&]+) [NC]
RewriteRule ^jobs/?$ /?s=%1 [L,NC,R=301]

What I'd like to do is change %20 to be +.

For example, suppose I have the following URL:

http://www.example.com/jobs/?search=first%20second

I'd like to redirect it to the following:

http://www.example.com/?s=first+second

Thanks


Solution

  • ...but not so well if the URL contains a URL encoded space (%20).

    RewriteCond %{QUERY_STRING} (?:^|&)search=([^&]+) [NC]
    RewriteRule ^jobs/?$ /?s=%1 [L,NC,R=301]
    

    Since the QUERY_STRING is already URL encoded, you need to include the NE (noescape) flag on the RewriteRule directive to prevent any URL encoded strings being doubly encoded. This doesn't just apply to spaces (ie. %20 to %2520), but to many other non-alphanumeric characters as well.

    For example:

    RewriteRule ^jobs/?$ /?s=%1 [NE,L,NC,R=301]
    

    Do you still need to replace %20 with +? Whether spaces are URL encoded as %20 or + in the query string shouldn't strictly matter to the receiving application. Both will be URL-decoded as a literal space.

    NB: Only in the query string can a space be URL encoded as +. In other parts of the URL, a + is seen as the literal character (plus).

    Replace %20 with + in the search URL parameter value

    If you want to replace all %-encoded spaces (ie. %20) with + (alternative encoding) in the search URL parameter value then you can do it like this:

    Assumptions:

    • Only interested in the "search" URL parameter, since you are currently discarding all other URL parameters anyway.
    • There can be any number of %-encoded spaces, ie. %20, in the URL parameter value.

    For example:

    # Replace all "%20" in the "search" URL parameter with "+"
    # eg. "/jobs?search=foo%20bar%20baz%20pop" to "/jobs?search=foo+bar+baz+pop"
    RewriteCond %{QUERY_STRING} (?:^|&)(search)=([^&]*)%20([^&]*) [NC]
    RewriteRule ^jobs/?$ $0?%1=%2+%3 [N]
    
    # Redirect "/jobs/?search=<foo>" to "/?s=<foo>"
    # (Unchanged from original redirect)
    RewriteCond %{QUERY_STRING} (?:^|&)search=([^&]+) [NC]
    RewriteRule ^jobs/?$ /?s=%1 [NC,NE,R=302,L]
    

    The first rule repeatedly loops internally until all the occurrences of %20 have been replaced with + in the URL parameter value. The N (next) flag causes the ruleset to start over again from the top. On Apache 2.4+ you can set an upper (safe) limit on the number of iterations, eg. N=20. If there is no %20 then this rule is essentially skipped (since the condition will not match on the initial call).

    The $0 (dollar-zero) backreference in the substitution string captures the entire URL-path (saves repetition). The %1 (percent-one) backreference simply holds the "search" URL parameter name (again, saves repetition) - the first captured group in the last matched CondPattern.

    The %2 and %3 backreferences hold the values of the 2nd and 3rd captured groups in the last matched CondPattern, ie. the value before and after the last %20 character sequence in the URL parameter value.

    This rule needs to be near the top of your .htaccess file.

    The second rule is the same as your original redirect directive and actually performs the redirect, changing the URL-path and search URL parameter to s.

    You will need to clear your browser cache before testing since the earlier 301 (permanent) redirect will have been cached by your browser. Test with 302 (temporary) redirects - to avoid caching issues - and only change to 301 when everything works as intended.