I have a code to pass some utf-8 encoded JSON string via COM variant, specifically using CComVariant. Everything worked well until my software was installed on computer of Japanese user, I suppose he is working with Japanese version of Windows 7. And somehow that Windows decided to change bytes sequence for 1 non-ascii character and broke JSON formatting.
Problem with combination:
"nić" (bytes: 0x22 0x6E 0x69 0xC4 0x87 0x22)
after it was packed to CComVariant and then unpacked back the string above has changed to:
"niāE (bytes: 0x22 0x6E 0x69 0xC4 0x81 0x45)
I.e. combination ć"
became āE
.
My code works as following (simplified version):
void get_json(VARIANT *out)
{
const std::string json = "\"nić\"";
CComVariant result = json.c_str();
result.Detach(out);
}
then in other part of code:
CComVariant varJson;
get_json(&varJson);
const std::string utf8json = std::string(CStringA(varJson));
// At this point utf8json is not the same as original json above
// and cannot be decoded properly by JSON parser.
It seems I misunderstood something about CStringA in COM Variant, and passing UTF-8 bytes here was not safe. I can't reproduce this problem with Western Europe's version of Windows, this is somehow related to Japanese version.
The problem was explained in the comments. As for the solution, because you're using std
(there are plenty other solutions), I suggest you use the widen
function defined in this answer on SO: Is this code safe using wstring with MultiByteToWideChar?
and change the code to:
CComVariant result = widen(json).c_str();
Let's check under the debugger. Before:
After:
Now, the VARIANT (or it's contained BSTR) is fine.
Note if you need a byte string equivalent from this VARIANT or from a BSTR (do you really?), don't convert it back with a broken code like this: std::string(CStringA(varJson))
, again, use a reverse equivalent of widen
, based on WideCharToMultiByte
this time.