Search code examples
c++c++builderunicode-string

c++ towupper() doesn't convert certain characters


I use Borland C++ Builder 2009 and my application is translated into several languages, including Polish.

For a small piece of functionality I use towuppper() to capitalize a string, to put emphasis on it when first ignored by the user.

The original string is loaded from a language dll, into a utf16 wstring object and I convert like this:

int length = mystring.length() ;
for (int x = 0 ; x < length ; x++)
    {
    mystring[x] = towupper(mystring[x]);
    }

All this works well, except for Polish, where following sentence: "Rozumiem ryzykowność wykonania tej operacji" converts to "ROZUMIEM RYZYKOWNOść WYKONANIA TEJ OPERACJI" instead of "ROZUMIEM RYZYKOWNOŚĆ WYKONANIA TEJ OPERACJI"

(notice that the two last characters of the word "ryzykowność" do not convert).

It's not as if there are no capitalized Unicode variants of this character available. Unicode character 346 does the trick. http://www.fileformat.info/info/unicode/char/015a/index.htm

Is this a matter of an outdated library in my outdated compiler installation or am I missing something else ?


Solution

  • Implementations of towupper are not required by the C++ standard to perform Unicode case conversions. Even if wide strings are Unicode strings. Even in cases where one lower-case codepoint mapps to one upper-case one.

    Furthermore, towupper is incapable of performing proper Unicode case conversion, even if the implementation supported it. Case conversion can actually change the number of codepoints in a Unicode character sequence. And towupper is incapable of doing that.

    You cannot rely on the C++ standard library for dealing with Unicode matters of this sort. You'll need to move to a dedicated Unicode library like ICU.