I have a variant bstr that was pulled from MSXML DOM, so it is in UTF-16. I'm trying to figure out what default encoding occurs with this conversion:
VARIANT vtNodeValue;
pNode->get_nodeValue(&vtNodeValue);
string strValue = (char*)_bstr_t(vtNodeValue);
From testing, I believe that the default encoding is either Windows-1252 or Ascii, but am not sure.
Btw, this is the chunk of code that I am fixing and converting the variant to a wstring and going to a multi-byte encoding with a call to WideCharToMultiByte.
Thanks!
The operator char*
method calls _com_util::ConvertBSTRToString()
. The documentation is pretty unhelpful, but I assume it uses the current locale settings to do the conversion.
Update:
Internally, _com_util::ConvertBSTRToString()
calls WideCharToMultiByte
, passing zero for all the code-page and default character parameters. This is the same as passing CP_ACP
, which means to use the system's current ANSI code-page setting (not the current thread setting).
If you want to avoid losing data, you should probably call WideCharToMultiByte
directly and use CP_UTF8
. You can still treat the string as a null-terminated single-byte string and use std::string
, you just can't treat bytes as characters.