Search code examples
phpregexreplaceunicodesanitization

Remove unwanted multibyte characters without harming "foreign" characters


What is the best way to remove unwanted unicode characters without breaking other foreign characters?

็็็็็็็็็็็็็็็็็็ ็็็็็็็็็


Solution

  • If you filter the right Unicode ranges, this should work:

    $str = 'Your string with Unicode symbols'
    preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $str);
    

    https://3v4l.org/EY06F

    http://php.net/manual/en/function.preg-replace.php