Search code examples
unicodecjkcstring

cstring m_pszdata doesn't match converted char* in UNICODE


I tested the Unicode conversion with a UNICODE MFC dialog app, where I can input some Chinese in the edit box. After reading in the characters using

DDX_Text(pDX, IDC_EDIT1, m_strUnicode) UpdateDate(TRUE)

the m_pszdata of m_strUnicode shows "e0 65 2d 4e 1f 75 09 67". Then I used the following code to convert it to char*:

char *psText; psText = new char[dwMinSize]; WideCharToMultiByte (CP_OEMCP, NULL, m_strUnicode,-1, psText, dwMinSize, NULL, FALSE);

The psText contains "ce de d6 d0 c9 fa d3 d0", nothing similar with the m_pszdata of m_strUnicode. Would anyone please explain why it is like that?


Solution

  • ce de d6 d0 c9 fa d3 d0 is 无中生有 in GBK. You sure you're manipulating Unicode?


    CP_OEMCP instructs the API to use the currently set default OEM codepage.
    

    So my guess here is that you're on a Chinese PC with GBK as default codepage.

    无中生有 in UTF16LE is e0 65 2d 4e 1f 75 09 67 so basically you are converting a UTF-16-LE string to GBK.