Let's say I have the word "Russian" written in Cyrillic. This is would be the quivalent of the following in Hex:
Русский
My question is: how do I write a function which will go from "Russian" in Cyrillic to it's hex value as above? Could this same function work also for singel byte characters?
The 〹
thingies are called HTML Entities. In PHP there is a function that can create these: mb_encode_numericentity
Docs, it's part of the Multibyte String extension (Demo):
$cyrillic = 'русский';
$encoding = 'UTF-8';
$convmap = array(0, 0xffff, 0, 0xffff);
$encoded = mb_encode_numericentity($cyrillic, $convmap, $encoding);
echo $encoded; # русский
However: You need to know the encoding of your Cyrillic string. In this case I've chosen UTF-8
, depending on it you need to modify the $encoding
parameter of the function and the $convmap
array.