I do not have that much control of the remote server to install extensions, php is 5.3.8. But I've noticed that there is possible to split utf-8 string with pcre.
So for example: preg_split('@@u','bücher',-1,PREG_SPLIT_NO_EMPTY);
gives: Array ( [0] => b, [1] => ├╝, [2] => c, [3] => h, [4] => e, [5] => r )
or for chinese word: 中国/中华 it gives: Array ( [0] => ńŞş, [1] => ňŤŻ, [2] => /, [3] => ńŞş, [4] => ňŹÄ )
(the results are from non-unicode display), but it is clear that it is possible to split an utf-8
string without international extensions and then (I think) it should be possible to get character codes and do calculations with them to create ascii url.
The only things you need to know is the bitmasks that signal double,triple,quad byte code points:
Table from http://en.wikipedia.org/wiki/UTF-8
Bits Last Code Point Octet 1 Octet 2 Octet 3 Octet 4
7 U+007F 0xxxxxxx -/- -/- -/-
11 U+07FF 110xxxxx 10xxxxxx -/- -/-
16 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx -/-
21 U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
I don't speak php, but I'm quite sure existing code can be found that uses the shown bitmasks to scan a utf-8 char sequence without actually interpreting it