Search code examples
localemultilingualglobalization

Which standard language codes should I use for multilingual software?


I often see the abbreviation "en-US", which corresponds with the 2-character language codes standardized in ISO639-1. I also understand that the format of language tags generally consists of a primary language (subtag) code, followed by a series of other subtags separated by dashes, as explained in https://www.rfc-editor.org/rfc/rfc5646.

That link mentions that there are also 3-letter language codes defined in ISO639-2, ISO639-3, and ISO639-5.

Still, there are more codes defined for Windows/.NET here: http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx. These refer to the language tags as "culture names", and use a distinct 3-character code for "language name". So the "culture name" appears to be the 2-character language codes, although I'm not sure why they vary between Windows versions, or how well they follow the standard language codes. Is "en-US" really a "language code" or is it a "culture name"?

If I'm developing software to use language codes, which standard should I use? (The 2-character codes or the 3-character codes? If 3-character, then ISO639- 2, 3, or 5?)

Why should I chose one over the other? (For OS platform or programming framework compatibility?)


Solution

  • Bcp47 is the industry best practice standard for identifying languages. You should use these language tags. Bcp47 dictates that if a language can be identified using a 2 letter or 3 letter tag, the 2 letter tag should be used.

    Cultures and locales are distinct from language tags in how they conceive of the region information. The region information in a language tag identifies the origin of the particular dialect (en-US is American English or the variety of English that originated in the United States), the region information in a locale identifies the location where the information is relevant. Since the majority of American English speakers also live in the US, the distinction is not really important when it comes to providing information such as how to spell words or format dates or numbers.

    Windows is moving away from the concept of a locale or culture to a more expressive notion of language and region (separately identified) which allows us to identify situations such as a speaker of American English who resides in England.

    Note that there are cases where Windows still uses legacy names that predate this standard and depending on how you rely on the OS, you may need to map between standard compliant names and the legacy name.