Search code examples
pythonrubyalgorithmlevenshtein-distance

Determining if similar string exists or not in an array using Levenshtein distance


I have an array of strings called referenceArray for example. I now have a string str. I want to check if any element in referenceArray is similar to str. I can calculate the Levenshtein distance between each element of referenceArray and str, and pick the element with the minimum distance. But,the problem with this approach is i also need to know if none of the elements in referenceArray are similar to str.So in that case picking the one with minimum L distance would be wrong.

For example,

referenceArray = ['saint louis','new york']
str='st. louis'

In this case i pick 'saint louis' since it has minimum L distance of 4.

But if str='toronto', the one with minimum L distance is 'new york', but the strings are ofcourse totally different. How can i determine if none of the elements in referenceArray match str or if there is a similar string ?

Thank You


Solution

  • How about setting some threshold of acceptable distance? Say, you accept the string with minimal distance only if this distance is lower than 10 or sqrt(len(str)) or something similar.