Search code examples
phpunicodemultilingualtcpdf

php ucfirst ucwords discussion


I just wanted to share my experience when needing to deal with an language independent version of ucfirst. the problem is when you are mixing English texts with Japanese, chinese or other languages as in my case sometimes Swedish etc. with ÅÄÖ, traditional ucfirst has issues with converting the string to capitalized.

I did however sometime ago stumbled across the following code snippet here on stack overflow:

function myucfirst($str) {
    $fc = mb_strtoupper(mb_substr($str, 0, 1));
    return $fc.mb_substr($str, 1);
}

It works fine in most cases but recently I also needed the translations autogenerate texts in dynamic pdfs using TCPDF.

This is when I hit my head over why TCPDF had issues with the text. I had no problems anywhere else, the character encoding was utf8 but still it bricked.

When showing Kanji for Japanese signs, I just put ignore using the above function to captitalize the word but all of a sudden when using Swedish, I encountered the same brick when I need to capitalize ÅÄÖ.

That led me to realize that the problem with the function above is that it's only looking at the first character. ÅÄÖ is taking up 2 letter spaces and kanjis for chinese or Japanese letters take up 3 letter spaces and the function above did not consider that resulting to bricking TCPDF.

To give more context, When generating PDF documents with TCPDF the TCP font will end up getting errors since the gerneal mb_string function will translate the first character to "?�"vrigt for the swedish word Övrigt and with for instance Japanese "?��"のととろ, for 隣のトトロ (my neighbour totoro.) this will make the font translation for the � not work correctly. you need to do the conversion of ÅÄÖ for the first two letters substr($str, 0,2) to be able to convert the letter properly.

Also I am not sure if you see the code examples I gave but since neither chinese or japanese use upper case letters in their writing language, I am excluding every sign that requires 3 letter spaces since they are not managing upper / lower cases at all. I don't really want to exclude them but parsing them through mb_string will lead to similar errors in TCPDF so, my examples are a workaround for now or if someone has a better solution.

so... my approach was to solve the above problem by using the following function.

function myucfirst($str) {
    if ($str[0] !== "?"){
        for($i = 1; $i <= 3; $i++){
            $first = substr($str, 0, $i);
            $first = mb_convert_case($first, MB_CASE_UPPER, "UTF-8");
            if ($first !== '?'){                
                $rest = substr($str, $i);
                break;
            }
        }
        if ($i < 3){
            $ret_string = $first . $rest;
        } else {
            $ret_string = $str;
        }
    } else {
        $ret_string = $str;
    }   
    return $ret_string;
}

Thanks to Steven Pennys' help below, this is the solution that's working both with Swedish and Japanese / chinese special characters, even when needing to use a string with the library TCPDF for dynamically creating PDFs:

function myucfirst($str) {
    $ret_string = mb_convert_case($str, MB_CASE_TITLE, 'UTF-8');
    return $ret_string;
}

and following to do a similar fix for ucwords

function myucwords($str){
    $str = trim($str);
    if (strpos($str, ' ') !== false){
        $str_arr = explode(' ', $str);
        foreach ($str_arr as $word){
            $ret_str .= isset($ret_str)? ' ' . myucfirst($word):myucfirst($word);
        }
    } else {
        $ret_str = myucfirst($str);
    }
    return $ret_str;
}

The myucwords is using the first myucfirst to capitalize each word.

Since I am not that experienced as a developer or a stack overflow contributor, you should be able to see 3 code examples and I would really appreciate if there's better ways to write these functions but for now, for those who have the similar problem, please enjoy!

/Chris


Solution

  • The examples you gave are poor, as with Övrigt the input is exactly the same as the output. So I modified the example so they can be useful. See below:

    <?php
    # example 1
    $s1 = mb_convert_case('åäö', MB_CASE_TITLE);
    # example 2
    $s2 = mb_convert_case('övrigt', MB_CASE_TITLE);
    # exmaple 3
    $s3 = mb_convert_case('隣のトトロ', MB_CASE_TITLE);
    # print
    var_dump($s1 == 'Åäö', $s2 == 'Övrigt', $s3 == '隣のトトロ');
    

    Note you will need this in your php.ini, if its not already:

    extension = mbstring
    

    https://php.net/function.mb-convert-case