Search code examples
data-structureslevenshtein-distance

Way to implement "Get all strings with Levenshtein distance less than X"


I'm wondering whether there's an efficient data structure to perform "Retrieve all strings with levenshtein distance less than X".

Few things I'm interested in:

  • Explanation of the algorithm.
  • Is there an existing implementation in existing database / programming langauge?
  • Paper / article that I can refer to?

Solution

  • this is nearest neighborer search in a metric space with levenshtein distance as the metric (or distance) function

    a VP-tree is one of the ways of solving that problem

    this Python VP-tree implementation is a working demo that shows how a VP-tree works run it on say a word list it provides an interactive shell where you type a word and it returns the words in that list that are no more then X distance from the word you typed