Search code examples
pythonspell-checking

Python Spell Checker


I need a spell checker in python. I've looked at previous answers and they all seem to be outdated now or not applicable:

Python spell checker using a trie This question is more about the data structure.

Python Spell Checker This is a spelling corrector, given two strings.

http://norvig.com/spell-correct.html Often referenced and quite interesting, but also a spelling corrector, and accuracy isn't quite good enough, though I'll probably use this in combination with an checker.

Spell Checker for Python Uses pyenchant which isn't maintained anymore.

Python: check whether a word is spelled correctly Also suggests Pyenchant which isn't maintained.

Some details of what I need:

  • A function that accepts a string (word) and returns a boolean whether the word is valid English of not. The unit test would want True on an input of "car" and False on an input of "ijjk".
  • Accuracy needs to be above 90%, but not higher than that. I'm just using this to exclude words during preprocessing for document classification. Most of the errors will be picked up anyway as words that appear too seldom (though not all.). Spell correcting won't work in all cases because a lot of the errors are OCR issues that are too far off to fix.
  • If it can deal with legal terms that would be a big plus. Otherwise I might need to manually add certain terms to the dictionary.

What's the best approach here? Are there any maintained libraries? Do I need to download a dictionary and check against it?


Solution

  • If you need simple per-word check, you just need corpus of words (preferably matching your terminology), read it into python set and make membership check for every single word one by one.

    Once/if you have issues with this naive implementation, you'll drill down to concrete problems.