How to uppercase/lowercase UTF-8 characters in C++?

Let's imagine I have a UTF-8 encoded std::string containing the following:

óó

and I'd like to convert it to the following:

ÓÓ

Ideally I want the uppercase/lowercase approach I'm using to be generic across all of UTF-8. If that's even possible.

The original byte sequence in the string is 0xc3b3c3b3 (two bytes per character, and two instances of ó) and I'd like the output to be 0xc393c393 (two instances of Ó). There are some examples on StackOverflow but they use wide character strings, and other answers say you shouldn't be using wide character strings for UTF-8. It also appears that this problem can be very "tricky" in that the output might be dependent upon the user's locale.

I was expecting to just use something like std::toupper(), but the usage is really unclear to me because it seems like I'm not just converting one character at a time but an entire string. Also, this Ideone example I put together seems to show that toupper() of 0xc3b3 is just 0xc3b3, which is an unexpected result. Calling setlocale to either UTF-8 or ISO8859-1 doesn't appear to change the outcome.

I'd love some guidance if you could shed some light on either what I'm doing wrong or why my question/premise is faulty!

Solution

There is no standard way to do Unicode case conversion in C++. There are ways that work on some C++ implementations, but the standard doesn't require them to.

If you want guaranteed Unicode case conversion, you will need to use a library like ICU or Boost.Locale (aka: ICU with a more C++-like interface).