Search code examples
phpiconv

How to remove these iconv-translated ASCII question mark characters from this string?


I'm translating user-submitted strings from UTF-8 to ASCII-Printable:

$str = 'Thê qúïck 😈 brõwn fõx júmps?😈 Óvér thé lázy dõg?😈';

$out = iconv('UTF-8', 'ASCII//TRANSLIT', $str);

var_dump($out);

$out = 'The quick ? brown fox jumps?? Over the lazy dog??';

I want the extra ? question marks from $out removed.

if ($out !== $str && strpos($out, '?') !== false) {
    // The input string was modified and contains at least one question mark
    //
    // Not even really sure where to begin
    //
    // Do we need to compare the position of every character from the
    // original string to every position of the new string and replace
    // where the original string did not contain a question mark?
    //
    // That's all I can think of, but there has to be a better way.
}

I want to keep all //TRANSLIT characters, including those few included in the example $str above, e.g.áéïõú = aeiou. There is no other nuace to this question. I think it boils down to a string comparison and replace question.

I'm not necessarily looking for someone to write the entire code, just a pointer in the right direction of how you'd tackle this.


Solution

  • Here is a solution based on transliterator_transliterate():

    $str = transliterator_transliterate('Latin-ASCII', 'Thê qúïck 😈 brõwn fõx júmps?😈 Óvér thé lázy dõg?😈');
    $str = preg_replace('/[\x80-\xFF]/', '', $str);
    echo $str;
    

    Output:

    The quick  brown fox jumps? Over the lazy dog?
    

    Note that the emoji are kept by transliterator_transliterate(), so I used a regex to remove all the remaining non-ASCII characters.