Search code examples
phphtmlapachelang

Convert en_US to to en-US


I'm writing a PHP application that supports multiple languages.

When setting the locale in PHP, I am required to provide a value defined in, what I believe to be, RFC 1766 / ISO 639, according to the setlocale documentation.

setlocale( LC_ALL, 'en_US' );
var_dump( setlocale( LC_MESSAGES, '0' ) );
// string(5) "en_US"

When using this locale to describe the HTML lang attribute, validation fails because it is not formatted to RFC 5646. The RFC 5646 value for this language is actually en-US (note the use of a hyphen instead of an underscore).

Using this value in PHP's setlocale function, as above, results in the following output:

string(1) "C"

I have no idea why it is returning a value of C, but I presume it is because the locale provided was incorrectly formatted. C being the original server default, which is described as ASCII (thanks to @Cheery for the reference).

So, I'm wondering what I should do about that. I could, feasibly, use PHP's str_replace function to switch - to _ before outputting the lang attribute, like so:

<?php setlocale( 'en_US' ); ?>
<!doctype html>
<html lang="<?= str_replace( '_', '-', setlocale(LC_MESSAGES, '0') ); ?>">
...

But, I'm concerned that there may be other differences between the two language specifications that could yield an unexpected problem down the road. If so, is there a preferred way to translate the language codes already in PHP, or a translation class that can be used?

Bonus question, why does my server default to value of C for the locale?


Solution

  • You need to have in mind that setLocal accept many types of "locale" names like names and mixed things, for example in (from php documentation):

    $loc_de = setlocale(LC_ALL, 'de_DE@euro', 'de_DE', 'de', 'ge');
    

    You have 'de_DE@euro' which isn't a valid HTML lang code.

    So first, you need to ensure that is in the format lang_region before trying to convert it.