Search code examples
javascriptprobabilitystring-comparison

Creating an algorithm to define matching probability of strings in Javascript


I am working on google map and trying to find specified place. But, sometimes I get more results of specified place (as I am trying to search the place with its name in whole city). So I have limited the output to first three matched suggestions and excluded all results beyond that. But the issue is, there is still possibility that first 3 matched suggestions are of same place and I want to show only one suggestion from them to make the suggestions precise. For Example:

  • I have searched for "Pizza Hut"
  • I get first three results as follows:
    1. Pizza Hut, MI Road, Jaipur --- 1st suggestion
    2. Pizza Hut, MI Road, Ajmeri Gate, Jaipur --- 2nd suggestion
    3. Pizza Hut, Malviya Nagar, Jaipur --- 3rd suggestion

1st and 2nd suggestions are indicating same place.

Now I want to apply prediction approach (using probability) to find that 1st and 2nd suggestions are same places and 3rd is different place.

What was my approach:-

var p = [], ap = []; //p -- places & ap -- array of splitted strings
    p[0] = "Pizza Hut, MI Road, Jaipur";
    p[1] = "Pizza Hut, MI Road, Ajmeri Gate, Jaipur";
    p[2] = "Pizza Hut, Malviya Nagar, Jaipur";

    //split all places
    ap[0] = p[0].split(",");
    ap[1] = p[1].split(",");
    ap[2] = p[2].split(",");

    /*
     --- Theoretically ---
     ### Splitting strings into symbols ###
     string_symbols_1 = a1, a2, a3;  --- a1 = "Pizza Hut", a2 = "MI Road", a3="Jaipur"; 
     string_symbols_2 = b1, b2, b3, b4; --- b1 = "Pizza Hut", b2 = "MI Road", b3 = "Ajmeri 
     Gate", b4 = "Jaipur"
     string_symbols_3 = c1, c2, c3; --- c1 = "Pizza Hut", c2 = "Malviya Nagar", c3 = 
     "Jaipur"

     ### On Prediction Basis ###
     I am trying to evaluate that if 60% of the symbols match with 
     another string symbols then there is probability that both strings are 
     same. 
     From above case I am considering if I am able to find >40% unique 
     symbols in both strings (that is being compared) then there is 
     probability that both strings are unique. (It will reduce the 60% 
     comparison to 40% comparison in best cases).
     Once found the unique strings return their indexes;
    */

    //pseudo implementation
    function findUniquePlaces(ap){
       //stuck here..
       //now match the splitted string arrays to find the unique places 
       //what should be the logic
       return index(0 and 2)
    }

I know how can I implement this. But I want to know what would be the best possible way to implement the same. I want to ensure that this task mustn't be computationally intensive task. I have heard of map reduce technique. Should I use map reduce technique or there is some other technique that could be more computationally cheaper.


Solution

  • The general theory behind this would be string metric https://en.wikipedia.org/wiki/String_metric calculating distance between strings and then just use one of ready solutions:

    https://www.npmjs.com/package/fast-levenshtein

    https://www.npmjs.com/package/js-levenshtein

    https://www.npmjs.com/package/string-similarity

    or look for anything else you can find in google npm string distance

    Those are pretty fast so your issue may be more with size than speed.

    Or you can use fuzzy search https://glench.github.io/fuzzyset.js/

    Also, it's probable that this list you get already uses these algorithms underneath so just take first?