Search code examples
javalocaledecimalformatiso-639

DecimalFormatSymbols has unexpected region specific values for unorthodox locales


With help of DecimalFormatSymbols you can request locale-based characteristics, such as decimal separator or thousands separator.

As long as you request it for usual language tags (e.g. de-AT, en-US) it works as expected. But if you mix language-country combinations it behaves odd. Especially, let's take a look at the thousands separator. (for english it is ,, for german it is .)

    System.out.println("en-US: " + DecimalFormatSymbols.getInstance(Locale.US).getGroupingSeparator());
    System.out.println("de-DE: " + DecimalFormatSymbols.getInstance(Locale.GERMANY).getGroupingSeparator());
    System.out.println("de-US: " + DecimalFormatSymbols.getInstance(new Locale.Builder().setLanguage("de").setRegion("US").build()).getGroupingSeparator());
    System.out.println("de: "+DecimalFormatSymbols.getInstance(new Locale.Builder().setLanguage("de").build()).getGroupingSeparator());
    System.out.println("DE: " + DecimalFormatSymbols.getInstance(new Locale.Builder().setRegion("DE").build()).getGroupingSeparator());
    System.out.println("ru-RU: " + DecimalFormatSymbols.getInstance(new Locale.Builder().setLanguage("ru").setRegion("RU").build()).getGroupingSeparator());
    System.out.println("RU: " + DecimalFormatSymbols.getInstance(new Locale.Builder().setRegion("RU").build()).getGroupingSeparator());

The result is:

en-US: ,
de-DE: .
de-US: .
de: .
DE: ,
ru-RU: 0x160
RU: ,

For de-US it indicates a dot as separator which represents the separator in german but not for US. As if it only takes the language tag into account.

If I create a locale which only has a country information (language missing) it seems, that always the english separator format is returned.

How can I tackle this properly? I want the format for the most specific information in the locale. For de, I want the german one. For de-US I want the english format.


Solution

  • Locale-related information, like DecimalFormatSymbols, are generally stored in the Java Runtime Library in ResourceBundle files.

    Read the javadoc for full detail, but the relevant part is:

    Resource bundles belong to families whose members share a common base name, but whose names also have additional components that identify their locales. For example, the base name of a family of resource bundles might be "MyResources". The family should have a default resource bundle which simply has the same name as its family - "MyResources" - and will be used as the bundle of last resort if a specific locale is not supported. The family can then provide as many locale-specific members as needed, for example a German one named "MyResources_de".

    If there are different resources for different countries, you can make specializations: for example, "MyResources_de_CH" contains objects for the German language (de) in Switzerland (CH). If you want to only modify some of the resources in the specialization, you can do so.

    So, symbol lookup will use language-country combo. If not found, it will try using just language. Otherwise it will use the base file which has default values.

    The default value for getGroupingSeparator is ,, so that's the value you get for unsupported locales such as DE and RU.