Search code examples
javalocalizationcharacter-encodinglocale

From locale to ansi codepage to java charset?


is there a way to get a java.nio.charset.Charset from an ANSI CODEPAGE and the ansi codepage from a locale? For example, if i have the locale "en_US" i want to have the charset "cp1252", so i can call

private final Charset CS1252 = Charset.forName("cp1252");

or when i have the locale "ja_JP" for japanese, i wanna get the corresponding charset, like

private final Charset CS932 = Charset.forName("ms932");

How can i achieve that in java? So what i need is a Method like getCharsetForLocale(java.util.Locale loc)


Solution

  • You can't and it does not make sense. Actually, any language could be written with several different character encodings, for example English could be written with: ASCII, ISO8859-1, ISO-8859-15, Windows 1252, UTF-7, UTF-8, UTF-16, UTF-32 and many, many more, basically with all the Windows code pages for example.

    I am not sure what you are looking for, so let me suggest this:

    1. If you are looking to save the data, use UTF-8 regardless of Locale. Always. Yes, always. Don't worry about the space, for many languages it is efficient enough and the disk space is cheap.

    2. If you are want to know what kind of character encoding users might use, it is not valid to think they are restricted to a single one. Instead you may think of detecting the encoding using ICU Charset Detector for example (read more about detection here).

    3. If you want to know the current code page of the system, the easiest way to do that (and it is OS independent!) is to call Charset.defaultCharset().

    Next time, please try to describe your problem first, what you want to achieve and what you have already tried.