Search code examples
c++utf-8comvariant

How to marshall utf-8 bytestring via COM variant?


I have a code to pass some utf-8 encoded JSON string via COM variant, specifically using CComVariant. Everything worked well until my software was installed on computer of Japanese user, I suppose he is working with Japanese version of Windows 7. And somehow that Windows decided to change bytes sequence for 1 non-ascii character and broke JSON formatting.

Problem with combination:

"nić" (bytes: 0x22 0x6E 0x69 0xC4 0x87 0x22)

after it was packed to CComVariant and then unpacked back the string above has changed to:

"niāE (bytes: 0x22 0x6E 0x69 0xC4 0x81 0x45)

I.e. combination ć" became āE.

My code works as following (simplified version):

void get_json(VARIANT *out)
{
    const std::string json = "\"nić\"";
    CComVariant result = json.c_str();
    result.Detach(out);
}

then in other part of code:

CComVariant varJson;
get_json(&varJson);
const std::string utf8json = std::string(CStringA(varJson));
// At this point utf8json is not the same as original json above
// and cannot be decoded properly by JSON parser.

It seems I misunderstood something about CStringA in COM Variant, and passing UTF-8 bytes here was not safe. I can't reproduce this problem with Western Europe's version of Windows, this is somehow related to Japanese version.


Solution

  • The problem was explained in the comments. As for the solution, because you're using std (there are plenty other solutions), I suggest you use the widen function defined in this answer on SO: Is this code safe using wstring with MultiByteToWideChar? and change the code to:

    CComVariant result = widen(json).c_str();
    

    Let's check under the debugger. Before:

    enter image description here

    After:

    enter image description here

    Now, the VARIANT (or it's contained BSTR) is fine.

    Note if you need a byte string equivalent from this VARIANT or from a BSTR (do you really?), don't convert it back with a broken code like this: std::string(CStringA(varJson)), again, use a reverse equivalent of widen, based on WideCharToMultiByte this time.