Search code examples
apachemod-rewriteescapingspecial-characters

Encoding problems (special characters): Apache mod_rewrite -> PHP GET -> XML file name and tags


I'm trying to write a rule for mod_rewrite that does the following: URL: /xml/regularString redirected to /xml/script.php?code=regularString

This works very well with a simple rule such as:

RewriteRule ^xml/([A-Za-z0-9-]+)/?$ xml/script.php?code=$1 [NC]

(On a side note: Why is the ? important? I don't want there to be any optional redirection/match, but if I remove it it doesn't work)

Anyway... I mostly need to redirect even very uncommon special characters in the same manner, but if I try to extend the rule, it'll simply not match = 404, or it will jumble them so that the PHP script can't even be accessed because of something like this (error log):

404 etc... /xml/]*\xc2\xb0_\xc2\xb0*[

and the rule

RewriteRule ^/xml/([A-Za-z0-9-*°><_]+/?$ ... etc; with or without escaping them (via )

(This was supposed to be: URL /xml/>*°_°*< redirected to /xml/script.php?code= > *°_°*< (or some other encoding I can then decode in PHP to obtain the same result)

How can I make sure the mod_rewrite always finds the php script and redirect the code to it? I can figure out how to un-jumble it in PHP but not in those RewriteRules, I'm afraid. I tried flags such as [B] or [NE] that, in the documentation, were used to change the escape behaviour of mod_rewrite, but it didn't get any better than 404 with the aforementioned error log entry.


Solution

  • This worked for me under Debian/Apache2 in /.htaccess:

    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule ^xml/([^/]+)/?$ /xml/script.php?code=$1 [NC,R=301,L]
    

    Even with chinese ideographs, with apache2 server of utf8 encoding ...

    http://www.example.com/xml/急冻红辣椒 => /xml/script.php?code=急冻红辣椒
    

    /xml/script.php must exist, if not, an infinite loop is predictable.