I have an array of strings called referenceArray for example. I now have a string str. I want to check if any element in referenceArray is similar to str. I can calculate the Levenshtein distance between each element of referenceArray and str, and pick the element with the minimum distance. But,the problem with this approach is i also need to know if none of the elements in referenceArray are similar to str.So in that case picking the one with minimum L distance would be wrong.
For example,
referenceArray = ['saint louis','new york']
str='st. louis'
In this case i pick 'saint louis' since it has minimum L distance of 4.
But if str='toronto'
, the one with minimum L distance is 'new york'
, but the strings are ofcourse totally different. How can i determine if none of the elements in referenceArray match str or if there is a similar string ?
Thank You
How about setting some threshold of acceptable distance? Say, you accept the string with minimal distance only if this distance is lower than 10
or sqrt(len(str))
or something similar.