I have this code to convert letters to uppercase:
// make this character upper
if(_istalpha(zChar) && !_istupper(zChar))
pMsg->wParam = (WPARAM)_toupper(zChar);
It has worked for years. Recently I was asked to support Arabic and my user said letters were getting corrupted. It is because of the above code.
I am told in Arabic that uppercase does not apply. I know I can test my program settings to see if they are using Arabic and avoid this code. But is there another way?
I know with dates you call _tsetlocale
first for example.
Update:
Located this topic about toupper which mentions the locale setting! Will try it.
As you've discovered, the classic conversion routines like the CRT's toupper
and Win32's CharUpper
are rather dumb. They generally hail from the time when all the world was assumed to be ASCII.
What you need is a linguistically-sensitive conversion. This is a computationally more expensive operation, but also very difficult to implement correctly. Languages are hard. So you want to offload the responsibility if at all possible to a standard library. Since you're using MFC, you're obviously targeting the Windows operating system, which means you're in luck. You can piggyback on the hard work of Microsoft's localization engineers, giving the additional benefit of consistency with the shell and other OS components.
The function you need to call is LCMapStringEx
(or LCMapString
if you are still targeting pre-Vista platforms). The complexity of this function's signature serves as strong testament to the complicated task of proper linguistically-aware string handling.
LOCALE_NAME_USER_DEFAULT
, but you can use anything you want here.LCMAP_UPPERCASE | LCMAP_LINGUISTIC_CASING
. To do the reverse operation, you'd use LCMAP_LOWERCASE | LCMAP_LINGUISTIC_CASING
. There are lots of other interesting and useful options here to keep in mind, too.Putting it all together:
BOOL ConvertToUppercase(std::wstring& buffer)
{
return LCMapStringEx(LOCALE_NAME_USER_DEFAULT /* or whatever locale you want */,
LCMAP_UPPERCASE | LCMAP_LINGUISTIC_CASING,
buffer.c_str(),
buffer.length(),
&buffer[0],
buffer.length(),
NULL,
NULL,
0);
}
Note that I'm doing an in-place conversion here of the contents of the buffer, and therefore assuming that the uppercased string is exactly the same length as the original input string. This is probably true, but may not be a universally safe assumption, so you will either want to add handling for such errors (ERROR_INSUFFICIENT_BUFFER
) and/or defensively add some extra padding to the buffer.
If you'd prefer to use CRT functions like you're doing now, _totupper_l
and its friends are wrappers around LCMapString
/LCMapStringEx
. Note the _l
suffix, which indicates that these are the locale-aware conversion functions. They allow you to pass an explicit locale, which will be used in the conversion.