Search code examples
localizationinternationalizationlocalel10n.js

Should I maintain different files for different locales?


I have an application that currently support 'en' and 'fr' locales, and maintain one language file for each locale i.e. 'en.json' and 'fr.json'

Now for the user from USA, locale comes in as "en_US", Canada 'en_CA', British 'en_UK' etc.

So now as a best practice, is it recommended that I maintain different files for different English Locales or I treat all English locales (en_CA, en_US, en_UK) as 'en' locale and refer to one file for all?


Solution

  • As usual, it depends.

    Typically, you will only have one English file, containing English-international messages. In this case, you won't maintain separate files for each version, but instead you will fall back to English international (having a request for en-US, en-CA, etc. you will serve messages from en.json).

    Judging by your nickname, you probably know that sometimes it is better to maintain separate messages for some specific cultures for which English-US messages (typically used as international English) might be simply way too direct.
    If there is a request for separate locale version (i.e. en-IN), you would serve messages from the specific file (i.e. en-IN.json), but fall back to en for each other language (en-GB, en-AU, etc.).

    Resource fall-back (which is the term specialists use for what I described above) could be quite painful to implement. Sure, usually you would fall back to base language (en for any en-XX), but there are some corner cases which you need to know: Portuguese/Brazilian Portuguese, Norwegian and Chinese. In case of Portuguese you should use pt-BR (i.e. pt-BR.json) and for requests to pt or pt-XX fall back to pt-BR, as Brazilian Portuguese is now the standard one. Obviously, it could be easily done by simply creating one file pt.json and let anything fall back to pt.

    This is not the case for neither Norwegian nor Chinese.

    There are two versions of Norwegian language:

    • Norwegian Nynorsk (locale nn-NO)
    • Norwegian Bokmål (locale nb-NO)

    There is also so-called macro language (locale no).
    Unless you maintain two separate versions of Norwegian (which is unlikely), you should use one resource file (i.e. no.json <- sounds funny, isn't it?) and fall back to it for any requests for nn-NO, nb-NO, nb, nn and no-NO (I believe simply no will be covered).

    The Chinese is even more complicated. You may have heard about Chinese Simplified (locale zh-Hans) and Chinese Traditional (zh-Hant). If you'll ever need to localize into Chinese, it make sense to maintain two separate Chinese files (i.e. zh-Hans.json and zh-Hant.json) and fall back any requests as follows:

    • zh, zh-CN and zh-SG to zh-Hans
    • zh-HG, zh-MO and zh-TW to zh-Hant

    I hope it gives you better understanding. It is worth to consider future localization plans to implement the resource fall-back mechanism as simple as it could be done (but no simpler). If you'll ever need to support languages like English, French, Italian, German and Spanish, there is no point in implementing complex rules - simply check if xx-XX.json exists and serve it if it does, if not check if xx.json exists (serve it...) or fall-back to default application language (en.json, I guess?).