This intrigues me, so I'm going to ask - for what reason is wchar_t
not used so widely on Linux/Linux-like systems as it is on Windows? Specifically, the Windows API uses wchar_t
internally whereas I believe Linux does not and this is reflected in a number of open source packages using char
types.
My understanding is that given a character c
which requires multiple bytes to represent it, then in a char[]
form c
is split over several parts of char*
whereas it forms a single unit in wchar_t[]
. Is it not easier, then, to use wchar_t
always? Have I missed a technical reason that negates this difference? Or is it just an adoption problem?
wchar_t
is a wide character with platform-defined width, which doesn't really help much.
UTF-8 characters span 1-4 bytes per character. UCS-2, which spans exactly 2 bytes per character, is now obsolete and can't represent the full Unicode character set.
Linux applications that support Unicode tend to do so properly, above the byte-wise storage layer. Windows applications tend to make this silly assumption that only two bytes will do.
wchar_t
's Wikipedia article briefly touches on this.