Search code examples
phpmysqlhtmlutf-8utf8mb4

Remove emojis / unicode chars


My website and database is set to utf-8 and utf8mb4.

On textareas it's perfectly fine when users put utf-8 symbols/emojis.

But on certain input fields (name, address etc.) I want to remove the possibility of those "funny symbols", and only deal with basic text and numbers, including danish characters æøå, accents and symbols like -_'@()?=,.:;!"#&<> etc.

How would I go about this?

Is there some native php function to strip unicode symbols/characters, or do I have to find/make a specific regex function for it?


Solution

  • There are functions for checking encoding: http://php.net/manual/en/function.mb-check-encoding.php but to strip out characters I think you would need to use regex:

    function StripNonUTF($str){
      return preg_replace('/[^\pL\pM[:ascii:]]+/g', '', $str);
    }
    
    • \pL matches any kind of letter from any language
    • \pM matches a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.)
    • [:ascii:] matches a character with ASCII value 0 through 127