Search code examples
cunicodewchar-t

Why isn't wchar_t widely used in code for Linux / related platforms?


This intrigues me, so I'm going to ask - for what reason is wchar_t not used so widely on Linux/Linux-like systems as it is on Windows? Specifically, the Windows API uses wchar_t internally whereas I believe Linux does not and this is reflected in a number of open source packages using char types.

My understanding is that given a character c which requires multiple bytes to represent it, then in a char[] form c is split over several parts of char* whereas it forms a single unit in wchar_t[]. Is it not easier, then, to use wchar_t always? Have I missed a technical reason that negates this difference? Or is it just an adoption problem?


Solution

  • wchar_t is a wide character with platform-defined width, which doesn't really help much.

    UTF-8 characters span 1-4 bytes per character. UCS-2, which spans exactly 2 bytes per character, is now obsolete and can't represent the full Unicode character set.

    Linux applications that support Unicode tend to do so properly, above the byte-wise storage layer. Windows applications tend to make this silly assumption that only two bytes will do.

    wchar_t's Wikipedia article briefly touches on this.