Search code examples
localizationinternationalizationlocale

What is a good definition for language code and locale codes?


  • When to use en_GB and en-GB ?
  • What is the difference ?
  • Is there an ISO name for this ISO 639-1 (language) and ISO 3166 (country) combination ?


  • Solution

  • There are several systems for locale identifiers. Many of them are similar at the first glance, but not when you go deeper:

    Some examples (Serbian-Serbia with Latin Script, Japanese-Japan with radical sorting):

    • UTS-35, ICU, Mac OS X, Flash: sr-Latn-RS, ja-JP@collation=radical
    • Newer UTS-35, BCP 47 extension U: sr-Latn-RS, ja-JP-u-co-unihan
    • Win 2000, XP: 0x81a, 0x10411
    • Vista, Win 7: sr-Latn-CS, ja-JP_radical
    • Java: sr_CS, ja_JP
    • Java 7: sr_RS, ja_JP
    • Linux: sr_RS@latin, ja_JP.utf8

    Think of it like different ways to talk about colors (RGB, CMYB, HSV, Pantone, etc.)

    So - vs. _ does not make sense unless you specify what the is the environment you are using. Use - and Java will not understand it, use _ and Windows will not understand it. ICU (and systems build on top of it) accept both - and _, but produce the _ style.

    There is no ISO that covers the combination of language-country. But there are ISOs that cover the various parts (language, country, script). The exact version of the ISO also depends on the system used for locale identifiers.


    In general you should accept both _ and -, and generate only one ("be liberal in what you accept and strict in what you emit") (like ICU).

    If you communicate with systems using another type of locale identifier, you will have to map to/from your system. That will force you to use _ or -. Some of the mappings will be lossy (there is no way to specify alternate calendars in Windows, Linux; or alternate sorting or scripts in Java older than 7, etc.) and round-tripping might not be possible (somewhat similar to conversions RGB-CMYK).

    Addition: things are different not only between systems, but they can change in time. For instance Java 7 added support for sr_RS and for scripts, Windows keeps adding support for more locales, new countries get created (Sudan split, Russia, Serbia) or disappear (East Germany, U.S.S.R, Yugoslavia) and so on.

    For internal representation you might want to choose the most powerful one, that can represent everything, and that is UTS-35 / BCP 47 (also used by CLDR and ICU).