Search code examples
cunicodeutf-8character-encodinglocale

C programming: How can I program for Unicode?


What prerequisites are needed to do strict Unicode programming?

Does this imply that my code should not use char types anywhere and that functions need to be used that can deal with wint_t and wchar_t?

And what is the role played by multibyte character sequences in this scenario?


Solution

  • Note that this is not about "strict Unicode programming" per se, but some practical experience.

    At my company, we created a wrapper library around IBM's ICU library. The wrapper library has a UTF-8 interface and converts to UTF-16 when it is necessary to call ICU. In our case, we did not worry too much about performance hits. When performance was an issue, we also supplied UTF-16 interfaces (using our own datatype).

    Applications could remain largely as is (using char), although in some cases they need to be aware of certain issues. For instance, instead of strncpy(), we use a wrapper which avoids cutting off UTF-8 sequences. In our case, this is sufficient, but one could also consider checks for combining characters. We also have wrappers for counting the number of code points, the number of graphemes, etc.

    When interfacing with other systems, we sometimes need to do custom character composition, so you may need some flexibility there (depending on your application).

    We do not use wchar_t. Using ICU avoids unexpected issues in portability (but not other unexpected issues, of course :-).