Trying to convert Japanese characters stored in wide char to UTF-8, in order to store the value in a json file using cJSON library. First tried using wcstombs_s
but apparently this does not support Japanese characters:
size_t len = wcslen(japanese[i].name) + 1;
char* japanese_char = malloc(len);
if (japanese_char == NULL) {
exit(EXIT_FAILURE);
}
size_t sz;
wcstombs_s(&sz, japanese_char, len, japanese[i].name, _TRUNCATE);
Then, based on other answers but also in a successful conversion from json UTF-8 to wide char, tried the opposite function as follows, but the destination buffer dest
contains only garbage characters:
size_t wcsChars = wcslen(japanese[i].name);
size_t sizeRequired = WideCharToMultiByte(CP_UTF8, 0, japanese[i].name, wcsChars, NULL, 0, NULL, NULL);
char* dest = calloc(sizeRequired, 1);
WideCharToMultiByte(CP_UTF8, 0, japanese[i].name, wcsChars, dest, sizeRequired, NULL, NULL);
free(dest);
The wide char (wchar_t
) I am trying to convert is ササササササササササササササササ
stored in japanese[i].name
(a wchar_t*
in a struct). Objective is to use cJSON's cJSON_CreateString
to save the value in a UTF-8 encoded json file.
Question: What is the proper way to convert Japanese from wchar_t to UTF-8 char in C (not C++)?
Your wcstombs_s()
code is passing the wrong value to the sizeInBytes
parameter:
sizeInBytes
The size in bytes of the
mbstr
buffer.
You are passing in the character count of japanese[i].name
, not the allocated byte count of japanese_char
. They are not the same value.
Unicode codepoints are encoded in UTF-16 (what wchar_t
strings are encoded as on Windows) using 2 or 4 bytes each, and in UTF-8 using 1-4 bytes each, depending on their value. Unicode codepoints in the U+0080..U+FFFF
range take up more bytes in UTF-8 than they do in UTF-16, so it is possible that your japanese_char
buffer needs to actually be allocated larger than your japanese[i].name
data. Just like you can call WideCharToMultiByte()
to determine the destination buffer size needed, you can do the same thing with wcstombs_s()
.
size_t len = 0;
wcstombs_s(&len, NULL, 0, japanese[i].name, _TRUNCATE);
if (len == 0)
exit(EXIT_FAILURE);
char* japanese_char = malloc(len);
if (!japanese_char)
exit(EXIT_FAILURE);
wcstombs_s(&len, japanese_char, len, japanese[i].name, _TRUNCATE);
...
free(japanese_char);
Your WideCharToMultiByte()
code is not null-terminating dest
due to you passing an explicit size to the cchWideChar
parameter.
cchWideChar
Size, in characters, of the string indicated by lpWideCharStr. Alternatively, this parameter can be set to -1 if the string is null-terminated. If cchWideChar is set to 0, the function fails.
If this parameter is -1, the function processes the entire input string, including the terminating null character. Therefore, the resulting character string has a terminating null character, and the length returned by the function includes this character.
If this parameter is set to a positive integer, the function processes exactly the specified number of characters. If the provided size does not include a terminating null character, the resulting character string is not null-terminated, and the returned length does not include this character.
cJSON_CreateString()
expects a null-terminated char*
string. So you need to either:
num
parameter of calloc()
to account for the missing null terminator.size_t wcsChars = wcslen(japanese[i].name);
size_t len = WideCharToMultiByte(CP_UTF8, 0, japanese[i].name, wcsChars, NULL, 0, NULL, NULL);
char* japanese_char = malloc(len + 1);
if (!japanese_char)
exit(EXIT_FAILURE);
WideCharToMultiByte(CP_UTF8, 0, japanese[i].name, wcsChars, japanese_char, len, NULL, NULL);
japanese_char[len] = '\0';
...
free(japanese_char);
wcslen()
, or set the cchWideChar
parameter of WideCharToMultiByte()
to -1, to include the null terminator in the output.size_t wcsChars = wcslen(japanese[i].name) + 1;
size_t len = WideCharToMultiByte(CP_UTF8, 0, japanese[i].name, wcsChars, NULL, 0, NULL, NULL);
if (len == 0)
exit(EXIT_FAILURE);
char* japanese_char = malloc(len);
if (!japanese_char)
exit(EXIT_FAILURE);
WideCharToMultiByte(CP_UTF8, 0, japanese[i].name, wcsChars, japanese_char, len, NULL, NULL);
...
free(japanese_char);
size_t len = WideCharToMultiByte(CP_UTF8, 0, japanese[i].name, -1, NULL, 0, NULL, NULL);
if (len == 0)
exit(EXIT_FAILURE);
char* japanese_char = malloc(len);
if (!japanese)
exit(EXIT_FAILURE);
WideCharToMultiByte(CP_UTF8, 0, japanese[i].name, -1, japanese_char, len, NULL, NULL);
...
free(dest);