I'm trying to detect emoji in my php code, and prevent users entering it.
The code I have is:
if(preg_match('/\xEE[\x80-\xBF][\x80-\xBF]|\xEF[\x81-\x83][\x80-\xBF]/', $value) > 0)
{
//warning...
}
But doesn't work for all emoji. Any ideas?
if(preg_match('/\xEE[\x80-\xBF][\x80-\xBF]|\xEF[\x81-\x83][\x80-\xBF]/', $value)
You really want to match Unicode at a character level, rather than trying to keep track of UTF-8 byte sequences. Use the u
modifier to treat your UTF-8 string on a character basis.
The emoji are encoded in the block U+1F300–U+1F5FF. However:
many characters from Japanese carriers' ‘emoji’ sets are actually mapped to existing Unicode symbols, eg the card suits, zodiac signs and some arrows. Do you count these symbols as ‘emoji’ now?
there are still systems which don't use the newly-standardised Unicode emoji code points, instead using ad-hoc ranges in the Private Use Area. Each carrier had their own encodings. iOS 4 used the Softbank set. More info. You may wish to block the entire Private Use Area.
eg:
function unichr($i) {
return iconv('UCS-4LE', 'UTF-8', pack('V', $i));
}
if (preg_match('/['.
unichr(0x1F300).'-'.unichr(0x1F5FF).
unichr(0xE000).'-'.unichr(0xF8FF).
']/u'), $value) {
...
}