I have a lot of spam on Google Analytics coming from different domains, and one of them is cyrillic encoded, so I'm in trouble to add it to my .htaccess
file.
I want to add с.новым.годом.рф
to the .htaccess
file to block it, but I don't know how to do that, because the file when saved don't preserve the cyrilic characters.
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)?с.новым.годом.рф.*$ [NC]
is converted to
RewriteCond %{HTTP_REFERER} ^http(s)?://(www\.)??.?????.?????.??.*$ [NC]
I have searched a way to convert cyrillic to unicode but I had no success. Any suggestion?
Thanks
HTTP headers can't include arbitrary raw Unicode characters, so the Referer
header contains an ASCII URI rather than Cyrillic characters in an IRI.
So you need to use the URI-form in the rule to match. To convert an IRI to a URI you use URL-UTF-8 encoding on path parts, and the IDN algorithm on the hostname.
eg using Python:
>>> u'с.новым.годом.рф'.encode('idna')
'xn--q1a.xn--b1aube0e.xn--c1acygb.xn--p1ai'
So:
RewriteCond %{HTTP_REFERER} ^https?://(www\.)?xn--q1a\.xn--b1aube0e\.xn--c1acygb\.xn--p1ai.*$ [NC]
It would still be a good idea to find a text editor for your .htaccess files that doesn't destroy perfectly good Unicode characters though.