Search code examples
phplocalizationlocale

how do I get locale code from language code in PHP


Basically I have a lot of language codes it, en, en-GB, de, de-CH, and so on... and from these I need to get a full locale code format: LANGCODE-COUNTRYCODE with the default country of the language if the country code is not already specified.

An example of what I mean/need:

INPUT      OUTPUT  

it     ->  it-IT  
it-IT  ->  it-IT  
en-GB  ->  en-GB  
en     ->  en-US  
es-AR  ->  es-AR   
es-MX  ->  es-MX 
es     ->  es-ES  

is there any library I'm unaware of or a simple way of achieving this in PHP?

I've tried finding solutions on google a lot but either it doesn't exist or I'm just using the wrong keywords... Do I really have to make a manual array of this by hand? there must be a better way, I'm sure!


Solution

  • Thanks to the help of iso.org and localeplanet.com plus some good old googling and a lot of elbow grease, I came up with this list below. It might not be perfect, but it will do the job for me... I hope it can be of help to others!

    <?php
        
        $defaultLocales = [
            'af' => 'af-ZA',
            'am' => 'am-ET',
            'as' => 'as-IN',
            'az' => 'az-AZ',
            'ba' => 'ba-RU',
            'be' => 'be-BY',
            'bg' => 'bg-BG',
            'bn' => 'bn-IN',
            'bo' => 'bo-CN',
            'br' => 'br-FR',
            'ca' => 'ca-ES',
            'co' => 'co-FR',
            'cs' => 'cs-CZ',
            'cy' => 'cy-GB',
            'da' => 'da-DK',
            'de' => 'de-DE',
            'el' => 'el-GR',
            'en' => 'en-US',
            'es' => 'es-ES',
            'et' => 'et-EE',
            'eu' => 'eu-ES',
            'fi' => 'fi-FI',
            'fo' => 'fo-FO',
            'fr' => 'fr-FR',
            'fy' => 'fy-NL',
            'ga' => 'ga-IE',
            'gd' => 'gd-IE',
            'gl' => 'gl-ES',
            'gu' => 'gu-IN',
            'he' => 'he-IL',
            'hi' => 'hi-IN',
            'hr' => 'hr-HR',
            'hu' => 'hu-HU',
            'hy' => 'hy-AM',
            'id' => 'id-ID',
            'in' => 'in-ID',
            'is' => 'is-IS',
            'it' => 'it-IT',
            'iw' => 'iw-IL',
            'ja' => 'ja-JP',
            'ka' => 'ka-GE',
            'kk' => 'kk-KZ',
            'kl' => 'kl-GL',
            'km' => 'km-KH',
            'kn' => 'kn-IN',
            'ko' => 'ko-KR',
            'kok' => 'kok-IN',
            'ky' => 'ky-KG',
            'lo' => 'lo-LA',
            'lt' => 'lt-LT',
            'lv' => 'lv-LV',
            'mi' => 'mi-NZ',
            'mk' => 'mk-MK',
            'ml' => 'ml-IN',
            'mn' => 'mn-MN',
            'mr' => 'mr-IN',
            'ms' => 'ms-MY',
            'mt' => 'mt-MT',
            'nb' => 'nb-NO',
            'ne' => 'ne-NP',
            'nl' => 'nl-NL',
            'nn' => 'nn-NO',
            'oc' => 'oc-FR',
            'or' => 'or-IN',
            'pl' => 'pl-PL',
            'ps' => 'ps-AF',
            'pt' => 'pt-PT',
            'ro' => 'ro-RO',
            'ru' => 'ru-RU',
            'rw' => 'rw-RW',
            'sa' => 'sa-IN',
            'si' => 'si-LK',
            'sk' => 'sk-SK',
            'sq' => 'sq-AL',
            'sr' => 'sr-RS',
            'sv' => 'sv-SE',
            'ta' => 'ta-IN',
            'te' => 'te-IN',
            'th' => 'th-TH',
            'tk' => 'tk-TM',
            'tr' => 'tr-TR',
            'tt' => 'tt-RU',
            'uk' => 'uk-UA',
            'ur' => 'ur-PK',
            'uz' => 'uz-UZ',
            'vi' => 'vi-VN',
            'wo' => 'wo-SN',
            'xh' => 'xh-ZA',
            'zh' => 'zh-CN',
            'zu' => 'zu-ZA'
        ];
        
        function getLocaleFromLang($lang) {
            global $defaultLocales;
            return $defaultLocales[$lang] ?? $lang;
        }
        
    ?>
    

    If you have any suggestions on how I might improve it, feel free to comment below!

    [EDIT]

    An explanation of the criteria I used.

    If you have the Italian it language we can assume it to be it-IT for Italian in Italy. Otherwise, you'd specify it-CH for Italian in Switzerland or it-SM for Italian in San Marino. In this case, it's easy.

    English is different. If you apply the same rule, then en-GB would be the default instead of en-US. which should be so IMO, but after decades of having en-US as the default locale for most programs in the IT world, it's only right for this to be true here too.

    I'd say half of the "defaults" were on the easy side, but there are also a lot of weird cases where I'm not familiar with the language or the country... When this happened I had to look up which country had the most amount of native speakers or something else that tied a language to a country of origin if the former solution was inconclusive.

    Some cases were hopeless, like ar Arabic. it's such a diffused language spoken in so many countries that I feel there is no point in giving a "default" locale...

    An explanation of why I need this.

    I'm willing to make these "discriminations" because of one reason: this is only a fallback situation!

    In input, I mostly get locales made of language and country codes like en-AU, es-AR, pt-BR, and so on... then, sometimes, there are a few cases where it's only the language code. when this happens some small mechanics break. first of all, the formatting of numbers, values, etc...

    It's mainly a contingency for user input, the user has the option to specify a country with the language, but if he doesn't, only the language code gets inserted... I'm stuck with this mix of full locale codes and language codes and I have to try and fallback on a default locale when only the language is provided so I can try to keep the rest of the code/formatting working as expected by the user!

    Hope this clarifies the situation