Search code examples
phpiconv

iconv() - UTF-8 to ISO-8859-1 fails with german Umlauts


I want to convert a string from utf-8 to iso-8859-1 in php. (actually I want to remove all characters that are not in the ISO-8859-1 character set).

$text = "test 💂🏼‍♂️ test xäöüx x@x x€x";

$text = iconv('UTF-8', 'ISO-8859-1//IGNORE', $text);

the expected output would be: test test xäöüx x@x xx

but I get: test test x���x x@x xx

why does iconv have problems with german umlauts? and why are they not removed when in doubt but turned into question marks?


Solution

  • Characters äöü (U+00E4, U+00F6 and U+00FC for what it's worth) have this encoding in ISO-8859-1:

    • ä: E4
    • ö: F6
    • ü: FC

    If we run a variation of your code with some additional debugging information:

    $text = 'äöü';
    $text = iconv('UTF-8', 'ISO-8859-1//IGNORE', $text);
    echo bin2hex($text);
    

    ... we get the expected output:

    e4f6fc
    

    You can see � for a few reasons, all of them related to whatever rendering tool you are using (a web browser, I presume):

    • ISO-8859-1 not expected or supported.
    • Missing or incorrect encoding information.
    • Missing glyph in selected font (this is rare in browsers, since they use fallback fonts).