I'm trying to convert an UTF-16 string into utf-8 and hit a little wall. The output string contains the caracters but with blank spaces!? The input is hi\0
and If I look at the output, it says h\0i\0
instead of hi\0
.
Do you see the problem here? Many thanks!
size_t len16 = 3 * sizeof(wchar_t);
size_t len8 = 7;
wchar_t utf16[3] = { 0x0068, 0x0069, 0x0000 }, *_utf16 = utf16;
char utf8[7], *_utf8 = utf8;
iconv_t utf16_to_utf8 = iconv_open("UTF-8", "UTF-16LE");
size_t result = iconv(utf16_to_utf8, (char **)&_utf16, &len16, &_utf8, &len8);
printf("%d - %s\n", (int)result, utf8);
iconv_close(utf16_to_utf8);
The input data for iconv
is always an opaque byte stream. When reading UTF-16, iconv
expects the input data to consist of two-byte code units. Therefore, if you want to provide hard-coded input data, you need to use a two-byte wide integral type.
In C++11 and C11 this should be char16_t
, but you can also use uint16_t
:
uint16_t data[] = { 0x68, 0x69, 0 };
char const * p = (char const *)data;
To be pedantic, there's nothing in general that says that uint16_t
has two bytes. However, iconv
is a Posix library, and Posix mandates that CHAR_BIT == 8
, so it is true on Posix.
(Also note that the way you spell a literal value has nothing to do with the width of the type which you initialize with that value, so there's no difference between 0x68
, 0x0068
, or 0x00068
. What's much more interesting are the new Unicode character literals \u
and \U
, but that's a whole different story.)