Search code examples
phpreplacepersianmultibyte

How to remove all multibyte characters in PHP?


I want to filter my variable and remove all multibyte characters except some of them (A list of Persian characters that I have). How could I do that in PHP?
Edit #1:
Here is my string code:

// variable
$str = ' سلامoff3 ';

// array of persian characters
$to = ['ا', 'ب', 'پ', 'ت', 'ث', 'ج', 'چ', 'ح', 'خ', 'د', 'ذ',
        'ر', 'ز', 'ژ', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ',
        'ف', 'ق', 'ک', 'گ', 'ل', 'م', 'ن', 'و', 'ه', 'ی', 'ء',];

I want to replace all multibyte characters except persian characters (there are persian characters and one multibyte hidden character after digit 3).
Edit #2:
The hidden character does not get visible but in phpStorm it's visible. I think StackOverFlow is filtering invalid characters (what I want to do).


Solution

  • The straightforward way to do this would be using mb_string:

    $str = ' سلامoff3 '; // variable
    $to = ['ا', 'ب', 'پ', 'ت', 'ث', 'ج', 'چ', 'ح', 'خ', 'د', 'ذ', 'ر', 'ز', 'ژ', 'س', 'ش', 'ص', 'ض', 'ط', 'ظ', 'ع', 'غ', 'ف', 'ق', 'ک', 'گ', 'ل', 'م', 'ن', 'و', 'ه', 'ی', 'ء',]; //
    $cleaned = "";
    for ($i = 0;$i <mb_strlen($str);$i++) {
        $char = mb_substr($str,$i,1);
        if (mb_strlen($char) == strlen($char) || in_array($char,$to)) {
            $cleaned .= $char;
        }
    }
    print_r($cleaned);
    

    Idea is to go through each character (via mb functions to get actual characters) and check if it's either single byte or in the permitted list before adding it to a new string.

    Note that this solution requires mb_string