I've integrated Hunspell in an unmanaged C++ app on Windows 7 using Visual Studio 2010.
I've got spell checking and suggestions working for English, but now I'm trying to get things working for Spanish and hitting some snags. Whenever I get suggestions for Spanish the suggestions with accent characters are not translating properly to std::wstring
objects.
Here is an example of a suggestion that comes back from the Hunspell->suggest
method:
Here is the code I'm using to translate that std::string
to a std::wstring
std::wstring StringToWString(const std::string& str)
{
std::wstring convertedString;
int requiredSize = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, 0, 0);
if(requiredSize > 0)
{
std::vector<wchar_t> buffer(requiredSize);
MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, &buffer[0], requiredSize);
convertedString.assign(buffer.begin(), buffer.end() - 1);
}
return convertedString;
}
And after I run that through I get this, with the funky character on the end.
Can anyone help me figure out what could be going on with the conversion here? I have a guess that it's related to the negative char returned from hunspell, but don't know how I can convert that to something for the std::wstring
conversion code.
It looks like the output of Hunspell is ASCII with code page 28591
(ISO 8859-1 Latin 1; Western European (ISO)) which I found by looking at the Hunspell default settings for the unix command line utility.
Changing the CP_UTF8
to 28591
worked for me.
// Updated code page to 28591 from CP_UTF8
std::wstring StringToWString(const std::string& str)
{
std::wstring convertedString;
int requiredSize = MultiByteToWideChar(28591, 0, str.c_str(), -1, 0, 0);
if(requiredSize > 0)
{
std::vector<wchar_t> buffer(requiredSize);
MultiByteToWideChar(28591, 0, str.c_str(), -1, &buffer[0], requiredSize);
convertedString.assign(buffer.begin(), buffer.end() - 1);
}
return convertedString;
}
Here is a list of code pages from MSDN that helped me find the correct code page integer.