Search code examples
c++asciihunspell

Handling Hunspell suggestions with special characters


I've integrated Hunspell in an unmanaged C++ app on Windows 7 using Visual Studio 2010.

I've got spell checking and suggestions working for English, but now I'm trying to get things working for Spanish and hitting some snags. Whenever I get suggestions for Spanish the suggestions with accent characters are not translating properly to std::wstring objects.

Here is an example of a suggestion that comes back from the Hunspell->suggest method:

Hunspell->suggest(...) result

Here is the code I'm using to translate that std::string to a std::wstring

std::wstring StringToWString(const std::string& str)
{
    std::wstring convertedString;
    int requiredSize = MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, 0, 0);
    if(requiredSize > 0)
    {
        std::vector<wchar_t> buffer(requiredSize);
        MultiByteToWideChar(CP_UTF8, 0, str.c_str(), -1, &buffer[0], requiredSize);
        convertedString.assign(buffer.begin(), buffer.end() - 1);
    }

    return convertedString;
}

And after I run that through I get this, with the funky character on the end.

After conversion to wstring

Can anyone help me figure out what could be going on with the conversion here? I have a guess that it's related to the negative char returned from hunspell, but don't know how I can convert that to something for the std::wstring conversion code.


Solution

  • It looks like the output of Hunspell is ASCII with code page 28591 (ISO 8859-1 Latin 1; Western European (ISO)) which I found by looking at the Hunspell default settings for the unix command line utility.

    Changing the CP_UTF8 to 28591 worked for me.

    // Updated code page to 28591 from CP_UTF8
    std::wstring StringToWString(const std::string& str)
    {
        std::wstring convertedString;
        int requiredSize = MultiByteToWideChar(28591, 0, str.c_str(), -1, 0, 0);
        if(requiredSize > 0)
        {
            std::vector<wchar_t> buffer(requiredSize);
            MultiByteToWideChar(28591, 0, str.c_str(), -1, &buffer[0], requiredSize);
            convertedString.assign(buffer.begin(), buffer.end() - 1);
        }
    
        return convertedString;
    }
    

    Here is a list of code pages from MSDN that helped me find the correct code page integer.