Search code examples
c++unicodeansi

Incorrect character conversion on Japanese systems


I have a project that is compiled with the multibyte character set. The conversion below fails when msg1 contains Japanese characters.

bool MyClass::UnfoldEnvelope(BSTR msg1)
{
    CW2A msg(msg1);
    LPCTSTR p0 = msg;
    ....
}

On entry, msg1 is a BSTR that contains unicode characters and has a path name in Japanese. The conversion CW2A appears to work in that after the call, msg contains the string recognizably in Japanese. However, the LPCTSTR assignment fails. After the line, p0 contains garbage. The string p0 is used subsequently in old code I am reluctant to touch.

What is the correct way to get a pointer to the string "msg" in this case?

In English all works fine.


Solution

  • Try to use WideCharToMultiByte! CP_ACP is transfer the wide character string to the current Windows language single byte string (it could be Japanese on Japanese Windows, CW2A do the same). If your Windows is not Japanese, but you have Japanese characters, you should use CP_UTF8 (UTF-8) and transfer the text back to UTF-16 (wchar_t) when it is used (displayed, printed or used as a file name). To transform back, you should use MultiByteToWideChar function.

    Just to make clear: ANSI multibyte code is just a subset of the whole Unicode. Windows use the same subset as your Windows location (you could config it in Control Panel). If you have a real Unicode string or not your locale based string, you should keep all of the characters in Unicode. If you want to work with single byte string and Unicode, you should transfer your wchar_t string (all Windows wide char is UTF-16) to UTF-8 Unicode string.

    Check this source:

    bool MyClass::UnfoldEnvelope(BSTR msg1) 
    {
        // Get the necessary space for single byte string 
        int new_size = WideCharToMultiByte( CP_UTF8, 0, msg1, -1, NULL, NULL, NULL, NULL );
        if ( new_size <= 0 )
          return false;
        // Use vector to C functions
        vector<char> p0(new_size);
        // Convert the string
        if ( WideCharToMultiByte( CP_UTF8, 0, msg1, -1, &p0[0], new_size, NULL, NULL ) <= 0 )
        {
          return false;
        }
        // use string as a usual single byte string (save, load etc.)
        .... 
        // get the string size in UTF-16
        new_size = MultiByteToWideChar( CP_UTF8, 0, &p0[0], -1, NULL, NULL );
        if ( new_size <= 0 )
          return false;
        // Use vector to C functions
        vector<wchar_t> p1w(new_size);
        // convert back to UTF-16
        if ( MultiByteToWideChar( CP_UTF8, 0, &p0[0], -1, &p1w[0], new_size ) <= 0 )
          return false;
        ...
        // use your Unicode string as a file name
        return ( CopyFileW( L"old_file", &p1w[0], TRUE ) != FALSE );
    }