Search code examples
javajava-meutf-8character-encodingiso-8859-1

How do I convert between ISO-8859-1 and UTF-8 in Java?


Does anyone know how to convert a string from ISO-8859-1 to UTF-8 and back in Java?

I'm getting a string from the web and saving it in the RMS (J2ME), but I want to preserve the special chars and get the string from the RMS but with the ISO-8859-1 encoding. How do I do this?


Solution

  • In general, you can't do this. UTF-8 is capable of encoding any Unicode code point. ISO-8859-1 can handle only a tiny fraction of them. So, transcoding from ISO-8859-1 to UTF-8 is no problem. Going backwards from UTF-8 to ISO-8859-1 will cause "replacement characters" (�) to appear in your text when unsupported characters are found.

    To transcode text:

    byte[] latin1 = ...
    byte[] utf8 = new String(latin1, "ISO-8859-1").getBytes("UTF-8");
    

    or

    byte[] utf8 = ...
    byte[] latin1 = new String(utf8, "UTF-8").getBytes("ISO-8859-1");
    

    You can exercise more control by using the lower-level Charset APIs. For example, you can raise an exception when an un-encodable character is found, or use a different character for replacement text.