Search code examples
pythonnode.jsstringspell-checking

Leveraging Spell Checker on local machine?


I notice that common applications on a given machine (Mac, Linux, or Windows) have their respective spell checkers. Everything from various IDE, to MS Word/Office, to Note taking software.

I am trying to utilize the built in utility of our respective machines in order to analyze strings for syntactic correctness. It seems that I cant just use what is on the machine and would have to likely download a dictionary in which to compare against.

I was not sure if there was a better way to accomplish this. I was looking at trying to do things locally, but I was not opposed to doing api or curl requests to determine if the words in a string are spelled correctly.

I was looking at:

  • LanguageTool (hello wrold failed to return an error)
  • Google's tbproxy seems to not be functional
  • Dictionary / Meriam-Webster require api keys for automation.

I was looking at Node packages and noticed spell checker modules which encapsulate wordlists as well.

Is there a way to utilize the built in machine dictionaries at all, or would it be ideal if I download a dictionary / wordlist to compare against?

I am thinking a wordlist might be best bet, but i didnt want to reinvent the wheel. What have others done to accomplish similar?


Solution

  • Your question is tagged as both NodeJS and Python. This is the NodeJS specific part, but I imagine it's very similar to python.


    Windows (from Windows 8 onward) and Mac OS X do have built-in spellchecking engines.

    • Windows: The "Windows Spell Checking API" is a C/C++ API. To use it with NodeJS, you'll need to create a binding.
    • Mac OS X: "NSSpellChecker" is part of AppKit, used for GUI applications. This is an Objective-C API, so again you'll need to create a binding.
    • Linux: There is no "OS specific" API here. Most applications use Hunspell but there are alternatives. This again is a C/C++ library, so bindings are needed.

    Fortunately, there is already a module called spellchecker which has bindings for all of the above. This will use the built-in system for the platform it's installed on, but there are multiple drawbacks:

    1) Native extensions must be build. This one has finished binaries via node-pre-gyp, but these need to be installed for specific platforms. If you develop on Mac OS X, run npm install to get the package and then deploy your application on Linux (with the node_modules-directory), it won't work.

    2) Using build-in spellchecking will use defaults dictated by the OS, which might not be what you want. For example, the used language might be dictated by the selected OS language. For a UI application (for example build with Electron) this might be fine, but if you want to do server-side spellchecking in languages other than the OS language, it might prove difficult.


    At the basic level, spellchecking some text boils down to:

    1. Tokenizing the string (e.g. by spaces)
    2. Checking every token against a list of known correct words
    3. (Bonus) Gather suggestions for wrong tokens and give the user options.

    You can write part 1 yourself. Part 2 and 3 require a "list of known correct words" or a dictionary. Fortunately, there is a format and tools to work with it already:

    • simple-spellchecker can work with .dic-files.
    • nspell is a JS implementation of Hunspell, complete with its own dictionary packages.
    • Additional Dictionaries can be found for example in this repo

    With this, you get to choose the language, you don't need to build/download any native code and your application will work the same on every platform. If you're spellchecking on the server, this might be your most flexible option.