Search code examples
phpmultithreadingapachelocalizationgettext

Gettext falls back to English for non-root language when running via PHP script executed over SSH


I'm preparing an application whose server-side is based on PHP, with the translations implemented using the native gettext functions.

To load the locale that's supposed to be used for the retrieval of the gettext translations, I follow the docs and use this:

// Set language to German
putenv('LC_ALL=de_DE');
setlocale(LC_ALL, 'de_DE');

So far this always worked in a reliable way for my REST API which makes use of gettext translations in the linked / described way. However, I have a case where it is not working as I would expect.

I have the following PHP-script:

$locales = [
    'de_CH',
    'en_GB',
    'fr_FR',
    'es_ES',
    'pt_PT',
    'it_IT'
];

$textdomain_loaded = false;

foreach ($locales as $locale_value) {

    var_dump(putenv("LC_ALL=$locale_value.UTF-8"));
    var_dump(
        setlocale(
            LC_ALL,
            "$locale_value.UTF-8"
        )
    );

    if ($textdomain_loaded === false) {

        bindtextdomain(
            domain   : 'firstdomain',
            directory: ROOT . '/translations'
        );

        bindtextdomain(
            domain   : 'seconddomain',
            directory: ROOT . '/translations'
        );

        // Set default textdomain to be first, as most translations are retrieved from there
        textdomain('firstdomain');
        
        $textdomain_loaded = true;

    }


    var_dump(_(message: 'Beispiel'));

}

All the .mo-files are properly translated and generated. If I execute the script via a .sh script (.zsh) that simply calls the script via

php -r 'require "path/to/php/script.php"; MyClass::for_loop();'

or if I fire a HTTP request to a controller that executes the script, I get the following output:

bool(true)
string(11) "de_CH.UTF-8"
string(8) "Beispiel"
bool(true)
string(11) "en_GB.UTF-8"
string(7) "Example"
bool(true)
string(11) "fr_FR.UTF-8"
string(7) "Exemple"
bool(true)
string(11) "es_ES.UTF-8"
string(7) "Ejemplo"
bool(true)
string(11) "pt_PT.UTF-8"
string(7) "Esemplo"
bool(true)
string(11) "it_IT.UTF-8"
string(7) "Esempio“

If I however execute the exact same .sh script (of course with the accordingly adapted path, but the exact same php script) on the server via ssh connection in a terminal, the outputs I get are:

bool(true)
string(11) "de_CH.UTF-8"
string(8) "Beispiel"
bool(true)
string(11) "en_GB.UTF-8"
string(7) "Example"
bool(true)
string(11) "fr_FR.UTF-8"
string(7) "Example"
bool(true)
string(11) "es_ES.UTF-8"
string(7) "Example"
bool(true)
string(11) "pt_PT.UTF-8"
string(7) "Example"
bool(true)
string(11) "it_IT.UTF-8"
string(7) "Example“

So the switching of the locale (environment and PHP locale) seemingly work, but the retrieved gettext translations always falls back to the English translation.

I've run locale -a on my server; all the locales that I set / call are defined / installed on my server.

When running gettext --help, I can see the following part in the output:

Standard search directory: /usr/local/share/locale

While my .mo files are stored under the translations directory of the root, as mentioned above, e.g.:

/translations/en_GB.UTF-8/LC_MESSAGES/firstdomain.mo /translations/en_GB.UTF-8/LC_MESSAGES/seconddomain.mo /translations/fr_FR.UTF-8/LC_MESSAGES/firstdomain.mo /translations/fr_FR.UTF-8/LC_MESSAGES/seconddomain.mo

and so on, for all of the (non-german, as msgids are in german / german is the root language) locales specified above.

So my question after all is, why is this problem happening?

I found out that it's actually english that is used as the default translation language for non-german gettext retrieval. If you for example overwrite the /translations/en_GB.UTF-8/LC_MESSAGES/firstdomain.mo file with the contents of /translations/pt_PT.UTF-8/LC_MESSAGES/firstdomain.mo, the output above shows, if the script is run per ssh on the server:

bool(true)
string(11) "de_CH.UTF-8"
string(8) "Beispiel"
bool(true)
string(11) "en_GB.UTF-8"
string(7) "Esemplo"
bool(true)
string(11) "fr_FR.UTF-8"
string(7) "Esemplo"
bool(true)
string(11) "es_ES.UTF-8"
string(7) "Esemplo"
bool(true)
string(11) "pt_PT.UTF-8"
string(7) "Esemplo"
bool(true)
string(11) "it_IT.UTF-8"
string(7) "Esemplo“

So it seems that the script always falls back to the translations provided via the file /translations/en_GB.UTF-8/LC_MESSAGES/firstdomain.mo, but exclusively if the script is ran per ssh terminal session. Why?

NOTES:

Doing putenv("LANGUAGE="); before setting the locale, or putenv("LC_ALL="); as mentioned here did not change anything.

Doing putenv("LANGUAGE="); before setting the locale, no matter if with the locale, including or excluding the .UTF-8 suffix, or just the first two string characters as done here also did not solve the problem.

Doing putenv('LANGUAGE=nl_NL'); as noted here also did not do the trick.

I noticed the note in the PHP manual for setlocale about "multithreaded server APIs", and saw this existing question, but I'm not sure if that's relevant to this case.

I now found out that I can actually reproduce the output I get from the server locally if I replace the part:

var_dump(putenv("LC_ALL=$locale_value.UTF-8"));
    var_dump(
        setlocale(
            LC_ALL,
            "$locale_value.UTF-8"
        )
    );

with:

var_dump(putenv("LANG=$locale_value.UTF-8"));

To me this indicates that for some reason,

putenv("LC_ALL=$locale_value.UTF-8");
setlocale(LC_ALL,"$locale_value.UTF-8");

is enough to switch the gettext locale when running the script locally, but it is not enough / disregarded to switch the locale on an apache server.


Solution

  • Thanks to the accepted answer of this sacred post, I could find a solution. Change my code example above:

    foreach ($locales as $locale_value) {
    
        var_dump(putenv("LC_ALL=$locale_value.UTF-8"));
        var_dump(
            setlocale(
                LC_ALL,
                "$locale_value.UTF-8"
            )
        );
    
        if ($textdomain_loaded === false) {
    
            bindtextdomain(
                domain   : 'firstdomain',
                directory: ROOT . '/translations'
            );
    
            bindtextdomain(
                domain   : 'seconddomain',
                directory: ROOT . '/translations'
            );
    
            // Set default textdomain to be first, as most translations are retrieved from there
            textdomain('firstdomain');
            
            $textdomain_loaded = true;
    
        }
    
    
        var_dump(_(message: 'Beispiel'));
    
    }
    

    to this:

    foreach ($locales as $locale_value) {
    
        var_dump(putenv("LC_ALL=$locale_value.UTF-8"));
        var_dump(
            setlocale(
                LC_ALL,
                "$locale_value.UTF-8"
            )
        );
    
        bindtextdomain(
            domain   : 'firstdomain',
            directory: ROOT . '/translations'
        );
    
        bindtextdomain(
            domain   : 'seconddomain',
            directory: ROOT . '/translations'
        );
    
        // Set default textdomain to be first, as most translations are retrieved from there
        textdomain('firstdomain');
    
    
        var_dump(_(message: 'Beispiel'));
    
    }
    

    That is, re-initiate the textdomain initiation and binding for every iteration. For usual PHP processes, this does not seem to be required, but if you run your PHP Script on an Apache server running PHP as PHP-FPM (FCGI) through an ssh command, then it is required. Why it did the trick? Absolutely no clue; would still be super-interested about the reason why.