Search code examples
calgorithmdata-structureslevenshtein-distanceedit-distance

How to find that two words differ by how much distance>> Is there any shortest way for this


I have read about Levenshtein distance about the calculation of the distance between the two distinct words.

I have one source string and i have to match it with all 10,000 target words. The closest word should be returned.

The problem is I have given a list of 10,000 target words, and input source words is also huge.... So what shortest and efficient algorithm to apply here. Levenshtein distance calculation for each n every combination(brute force logic) would be very time consuming.

Any hints, or ideas are most welcome.


Solution

  • I guess it depends a little on how the words are structured. For example this guy improved the implementation based on the fact that he processes his words in order and does not repeat calculations for common prefixes. But if all your 10,000 words are totally different that won't do you much good. It's written in python so might be a bit of work involved to port to C.

    There are also some kinda homebrew algorithms out there (with which I mean there is no official paper written about it) but that might do the trick.