Search code examples
javasearch-suggestionmisspelling

Lightweight library cappable of suggesting different spellings of words from a bounded set?


I was looking for lightweight library that'd allow me to feed it a bunch of words, and then ask it whether a given word would have any close matches.z

I'm not particularly concerned with the underlying algorithm (I reckon a simple hamming distance algorithm would probably suffice, were I to undertake the task myself).

I'm just in the development of a small language and I found it nifty to make suggestions to the user when an "Undefined class" error is detected (lots of times it's just a misspelled word). I don't want to lose much time on the issue though.

Thanks


Solution

  • Levenshtein distance is a common way of handling it. Just add all the words to a list and then brute-force iterate over it and return the smallest distance. Here's one library with a Levenschtein function: http://commons.apache.org/lang/api-2.4/org/apache/commons/lang/StringUtils.html

    If you have a large number of words and you want it to run fast, then you'd have to use ngrams. Spilt each word into bigrams and then add (bigram, word) to a map. Use the map to look up the bigrams in the target word, and then iterate through the candidates. That's probably more work than you want to do, though.