Search code examples
c++commsxmlstd

Default encoding for variant bstr to std::string conversion


I have a variant bstr that was pulled from MSXML DOM, so it is in UTF-16. I'm trying to figure out what default encoding occurs with this conversion:

VARIANT vtNodeValue;
pNode->get_nodeValue(&vtNodeValue);
string strValue = (char*)_bstr_t(vtNodeValue);

From testing, I believe that the default encoding is either Windows-1252 or Ascii, but am not sure.

Btw, this is the chunk of code that I am fixing and converting the variant to a wstring and going to a multi-byte encoding with a call to WideCharToMultiByte.

Thanks!


Solution

  • The operator char* method calls _com_util::ConvertBSTRToString(). The documentation is pretty unhelpful, but I assume it uses the current locale settings to do the conversion.

    Update:

    Internally, _com_util::ConvertBSTRToString() calls WideCharToMultiByte, passing zero for all the code-page and default character parameters. This is the same as passing CP_ACP, which means to use the system's current ANSI code-page setting (not the current thread setting).

    If you want to avoid losing data, you should probably call WideCharToMultiByte directly and use CP_UTF8. You can still treat the string as a null-terminated single-byte string and use std::string, you just can't treat bytes as characters.