Search code examples
php.htaccessurlunicodeshorthand

.HTACCESS Unicode Recignisition/Encoding


I am having a problem with my URL Shorthands in my .htaccess.

Namely, everything works fine with this (now old) code...

# URL ShortCut Maker.
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} .(.+)$
RewriteRule ^(.) ?open=encyclopedia&letter=$1&term=%1 [B,L,NC]

It shows URL correctly as example.com/Modesty (it shows the page as if the URL would be /?open=encyclopedia&letter=m&term=modesty), but the problem occurs when I enter: example.com/Šanti, or
example.com/Đin, or
example.com/Žal,
example.com/Čakra, or
example.com/Ćof...
...if I enter any of these URLs - it shows the page as if I would enter:
?open=encyclopedia and not as
?open=encyclopedia&letter=Š&term=Šanti

EDIT: Non-English letters in question/problem.

Any solution to this?


Solution

  • You can change your existing rule with rule:

    RewriteEngine On
    
    RewriteBase /
    
    # executes repeatedly as long as there are more than 1 spaces in URI
    RewriteRule "^(\S*)\s+(\S* .*)$" $1+$2 [N,NE]
    
    # executes when there is exactly 1 space in URI
    RewriteRule "^(\S*)\s(\S*)$" $1+$2 [L,R=302,NE]
    
    RewriteCond %{REQUEST_FILENAME} -f [OR]
    RewriteCond %{REQUEST_FILENAME} -d
    RewriteRule ^ - [L]
    
    RewriteRule ^([A-Z](?:[^\x00-\x7F]+|[A-Z])?).*$ ?open=encyclopedia&letter=$1&term=$0 [B,L,QSA]
    
    RewriteRule ^([^\x00-\x7F]+).*$ ?open=encyclopedia&letter=$1&term=$0 [B,L,QSA]
    

    Negated character class [^\x00-\x7F] matches any character outside ASCII range.