Search code examples
stringunicodenativemanagedmultibyte

Unicode <-> Multibyte conversion (native vs. managed)


I'm trying to convert unicode strings coming from .NET to native C++ so that I can write them to a text file. The process shall then be reversed, so that the text from the file is read and converted to a managed unicode string.

I use the following code:

String^ FromNativeToDotNet(std::string value)
{
  // Convert an ASCII string to a Unicode String
  std::wstring wstrTo;
  wchar_t *wszTo = new wchar_t[lvalue.length() + 1];
  wszTo[lvalue.size()] = L'\0';
  MultiByteToWideChar(CP_UTF8, 0, value.c_str(), -1, wszTo, (int)value.length());
  wstrTo = wszTo;
  delete[] wszTo;

  return gcnew String(wstrTo.c_str());
}


std::string FromDotNetToNative(String^ value)
{ 
  // Pass on changes to native part
  pin_ptr<const wchar_t> wcValue = SafePtrToStringChars(value);
  std::wstring wsValue( wcValue );

  // Convert a Unicode string to an ASCII string
  std::string strTo;
  char *szTo = new char[wsValue.length() + 1];
  szTo[wsValue.size()] = '\0';
  WideCharToMultiByte(CP_UTF8, 0, wsValue.c_str(), -1, szTo, (int)wsValue.length(), NULL, NULL);
  strTo = szTo;
  delete[] szTo;

  return strTo;
}

What happens is that e.g. a Japanese character gets converted to two ASCII chars (漢 -> "w). I assume that's correct? But the other way does not work: when I call FromNativeToDotNet wizh "w I only get "w as a managed unicode string... How can I get the Japanese character correctly restored?


Solution

  • Try this instead:

    String^ FromNativeToDotNet(std::string value)
    {
      // Convert a UTF-8 string to a UTF-16 String
      int len = MultiByteToWideChar(CP_UTF8, 0, value.c_str(), value.length(), NULL, 0);
      if (len > 0)
      {
        std::vector<wchar_t> wszTo(len);
        MultiByteToWideChar(CP_UTF8, 0, value.c_str(), value.length(), &wszTo[0], len);
        return gcnew String(&wszTo[0], 0, len);
      }
    
      return gcnew String((wchar_t*)NULL);
    }
    
    std::string FromDotNetToNative(String^ value)
    { 
      // Pass on changes to native part
      pin_ptr<const wchar_t> wcValue = SafePtrToStringChars(value);
    
      // Convert a UTF-16 string to a UTF-8 string
      int len = WideCharToMultiByte(CP_UTF8, 0, wcValue, str->Length, NULL, 0, NULL, NULL);
      if (len > 0)
      {
        std::vector<char> szTo(len);
        WideCharToMultiByte(CP_UTF8, 0, wcValue, str->Length, &szTo[0], len, NULL, NULL);
        return std::string(&szTo[0], len);
      }
    
      return std::string();
    }