Search code examples
c++stringcharacter-encodingc++-climarshalling

Converting Managed String to std:string using marshal context


I am aware of the post: Converting managed System::String to std::string in C++/CLI for the required conversion. But I came across the following code which uses marshal_context instead. I am trying to understand how it works.

// required header : #include <msclr/marshal.h>
System::String^ str = gcnew System::String(L"\u0105");
msclr::interop::marshal_context ctx;
auto constChars = ctx.marshal_as<const char*>(str);
std::string myString(constChars);

If I am not wrong str is a single "character" represented by 16 bits using UTF-16, which according to the Unicode list is a small Latin letter a with an ogonek. But myString comes out to be a single character ?. How does this conversion happen?

Moreover why does code work as "expected" when creating str with a an ASCII character say a. In UTF-16 a would be represented in 16 bits, with most/least (depending on endianess) significant 8 bits being all 0. Why does then myString have only one char a?


Solution

  • A std::string is a sequence of chars. A char can typically only hold ascii characters (in 8 bit). It can overflow when assigned a unicode character value that can exceed 8 bits. When it overflows you get a "garbaged" value.

    You need std::wstring, which contains a sequence of wchat_t to represent a unicode string.

    Therefore change your last 2 lines to:

    //-------------------------------------vvvvvvv--------
    auto constChars = ctx.marshal_as<const wchar_t*>(str);
    
    //---vvvvvvv----------------------
    std::wstring myString(constChars);