Search code examples
c++windowsunicodeutf-8utf-16

UTF-16 to UTF8 with WideCharToMultiByte problems


int main(){
//"Chào" in Vietnamese
wchar_t utf16[] =L"\x00ff\x00fe\x0043\x0000\x0068\x0000\x00EO\x0000\x006F";
//Dump utf16: FF FE 43 0 68 0 E 4F 0 6F (right)
int size = WideCharToMultiByte(CP_UTF8,0,utf16,-1,NULL,0,NULL,NULL);
char *utf8 = new char[size];
int k = WideCharToMultiByte(CP_UTF8,0,utf16,-1,utf8 ,size,NULL,NULL);
//Dump utf8: ffffffc3 fffffbf ffffc3 ffffbe 43 0
}

Here is my code, when i convert it string into UTF-8, it show a wrong result, so what is wrong with my code?


Solution

  • wchar_t utf16[] = L"\uFEFFChào";
    int size = 5;
    
    for (int i = 0; i < size; ++i) {
        std::printf("%X ", utf16[i]);
    }
    

    This program prints out: FEFF 43 68 E0 6F

    If printing out each wchar_t you've read from a file prints out FF FE 43 0 68 0 E 4F 0 6F then the UTF-16 data is not being read from the file correctly.. Those values represent the UTF-16 string: `L"ÿþC\0h\0à\0o".

    You don't show your code for reading from the file, but here's one way to do it correctly:

    https://stackoverflow.com/a/10504278/365496