I am currently preprocessing some 100000's of sentences. To improve our ML prediction we probably should run some sort of autocorrect/spellchecking on the data. However most implementation in python i found so far are slow. Is there an efficient and easy way to auto-correct an entire text file in python?
I tried to work with this in https://github.com/phatpiglet/autocorrect/ but it takes relatively long (I did not implement it well, but I guess someone has already done it somewhere)
As @Vishnudev mentioned, prefer using SymSpellCompound
According to benchmarks it's faster than other spelling correction implementations by orders of magnitude. Please refer to this graph
If you read the code behind autocorrect, it mentions that it's based on Peter Norvig's implementation available here
Also tried benchmarking spacy_hunspell but couldn't manage to improve performance timings by more than +15-2O%
Other improvements tracks:
Good luck in your task !