Search code examples
winapinon-ascii-charactershunspell

How to load hunspell dictionary in Windows path with non-ASCII characters?


How to load hunspell dictionary in Windows path with non-ASCII characters?

Hunspell manual suggests:

In WIN32 environment, use UTF-8 encoded paths started with the long path prefix \?\ to handle system-independent character encoding and very long path names, too.

So I have code to do the following:

QString spell_aff = QStringLiteral(R"(\\?\%1%2.aff)").arg(path, newDict);
QString spell_dic = QStringLiteral(R"(\\?\%1%2.dic)").arg(path, newDict);
// while normally not a an issue, you can't mix forward and back slashes with the prefix
spell_dic = spell_aff.replace(QChar('/'), QStringLiteral("\\"));
spell_dic = spell_dic.replace(QChar('/'), QStringLiteral("\\"));

qDebug() << "right before Hunspell_create";
mpHunspell_system = Hunspell_create(spell_aff.toUtf8().constData(), spell_dic.toUtf8().constData());
qDebug() << "right after Hunspell_create";

This prefixes \\?\ to the path, uses a consistent directory separator as documented by the note in microsoft documentation, and converts it to UTF-8 encoding with .toUtf8().

Yet running the code out on Windows 10 Pro fails:

Hunspell loading from path with non-ASCII characters fails

How to fix?

Using Qt5, MinGW 7.3.0.

I've also done due research and as far as I can see, LibreOffice does the same thing and it seemingly works for them: sspellimp.cxx, lingutil.hxx, and lingutil.cxx.


Solution

  • You can use GetShortPathNameW to obtain a pure-ASCII path that Hunspell will understand. See QTIFW-175 for an example.

    (thanks to Windows directory that will never contain non-ASCII characters for temp file?)