Search code examples
javascriptlocalizationreact-intl

What format of locale does React Intl use?


According to the React Intl documentation, to add support for a particular language to your application, you need to import a relevant language file from the locale-data/ directory and pass it to a React Intl function called addLocaleData().

When it comes to languages like English or French, everything is easy as you have en.js and fr.js that follow a familiar two-letter code naming convention. However, the locale-data directory also includes a whole range of files with three-letter language codes that I've never come across before, e.g.: agq, guz or kkj.

I am looking for some documentation that would help me map these language codes to their corresponding languages.

The reason why I need it is I am currently trying to find language files for Austrian German and Brazilian Portuguese and I simply don't know what codes they are hiding behind. I thought the former might be available under at.js but there's no such file and the latter - under br.js but that file includes some other language.


Solution

  • TL;DR

    It's using ISO 639, specifically:

    • ISO 639-1 (two-letter codes) for languages covered by ISO 639-1, and
    • ISO 639-3 for languages not covered by ISO 639-1 but covered by ISO 639-3

    You can find all of the codes here. It lists both the ISO 639-1 code (where there is one) and ISO 639-3 code (as well as the ISO 639-2/T and ISO 639-2/B codes, which I believe are obsolete).


    Details:

    The tag says "Combines react components with FormatJS." A quick search for FormatJS turns up its website, which on its about page says:

    Industry Standards

    FormatJS builds on the ECMAScript Internationalization API (ECMA-402), uses locale data from the CLDR, and works with the industry standard ICU Message syntax used by professional translators.

    (my emphasis)

    Following the CLDR link takes you to the CLDR page on unicode.org, which describes the format and link to the download page. Kicking around the CLDR site, it mentions ISO 639-3 language codes. Kicking around their "current work" we can find a UTF-8 list of those codes here, which lists all three you mentioned: agq (Aghem), guz (listed twice; Ekegusii and Gusii), and kkj (Kako).

    But, in a comment, user3775501 pointed out that the only code he/she had checked (for Welsh) was cym, but that looking at node_modules/react-intl/locale-data, it was cy instead. It's clearly cym in ISO 639-3, but it's cy in ISO 639-1; so apparently, they're using ISO 639 as a whole, not just ISO 639-3. ISO 639 defines two-letter codes (ISO 639-1) and three-letter codes (ISO 639-3) (there was a proposal for four-letter codes that has been withdrawn, and separately an ISO 639-2 which apparently had two parts, T and B, which I believe is obsolete). This page by SIL International, the registration authority for ISO 639-3, lists both the two-letter (ISO 639-1) and three-letter (ISO 639-3) codes. On the first page of codes starting with c we find Welsh, which is cy in ISO 639-1, cym in ISO 639-2/T, wel in ISO 639-2/B, and cym in ISO 639-3. (The Welsh name for Welsh is Cymraeg, hence cy/cym).

    Looking at node_modules/react-intl/locale-data, we can see both two-letter and three-letter codes. For instance, here are the c's:

    ca
    ce
    cgg
    chr
    ckb
    cs
    cu
    cy
    

    Looking at SIL International's list for c, we find:

    • ca - ISO 639-1 code for Catalan, Valencian; its ISO 639-3 code is cat
    • ce - ISO 639-1 code for Czechen; its ISO 639-3 code is che
    • cgg - ISO 639-3 code for Chiga, which has no ISO 639-1 code
    • chr - ISO 639-3 code for Cherokee, which has no ISO 639-1 code
    • ckb - ISO 639-3 code for Central Kurdish, which has no ISO 639-1 code
    • cs - ISO 639-1 code for Czech; its ISO 639-3 code is ces
    • cu - ISO 639-1 code for Church Slavic, Church Slavonic, Old Bulgarian, Old Church Slavonic, Old Slavonic; its ISO 639-3 code is chu
    • cy - ISO 639-1 code for Welsh; its ISO 639-3 code is cym

    So it would appear that they're using ISO 639-1 codes where they exist, and ISO 639-3 codes if there is no ISO 639-1 code.

    Searching for "austri" in the SIL list just finds Austrian Sign Language. Searching for "german" yields a number of German dialects, but nothing identified as Austrian. Wikipedia tells me most Austrians speak Bavarian, which is bar. Searching for "braz" doesn't turn up a Brazilian Portuguese; searching for "portu" turns up several Portuguese dialetcs, you'll have to figure out which of those is relevant for your target population.