Search code examples
phpinternationalizationgettext

Gettext not detecting utf8 properly in PHP


I have a PHP application using Gettext as the i18n engine. The translation works fine, the only problem is that I'm having encoding issues with UTF8 characters. My PHP code to load gettext is something like this:

bindtextdomain( $domain, PATH_BASE . DS . "language" . DS );
$this->utf8Encode = strtolower($encoding) == "utf-8";
bind_textdomain_codeset($domain, $encoding);

textdomain($domain);

My templates render the pages using the utf8 charset and I've tried just about anything to load the proper charset. For the current locale I'm loading SL_sl, the names appear correctly but have issues with UTF8 chars, so where it should appear Država, it shows up Dr?ava


Solution

  • So, it has happened before, and now it happened again, I found the solution myself! The problem was that like I said to @bozdoz, I was converting UTF8 text already, but I didn't realized that the gettext function returned a UTF8 string, so if you do this:

    $encoded = utf8_encode($utf8String);
    

    Then you'll have a really nasty bug when $utf8String is an actual UTF8 string. Therefore I did some modifications to my code and the translation method (simplified) ended up like this:

    $translation = gettext($singular);
    $encoded = $this->utf8Encode ? $this->Utf8Encode($translation) : $translation;
    

    And the Utf8Encode method is like this:

    private function Utf8Encode( $text )
    {
        if ( mb_check_encoding($text, "utf8") == TRUE ){
            return $text;
    
        return utf8_encode($text);
    }
    

    I hope that if somebody has the same error this can help!