Search code examples
javascriptstring-matchingfuzzy-search

Looking for a better javascript text-matching scoring system


I've been using String Score for a lot of projects. It's great for sorting lists, like names, countries, etc.

Right now, I'm working on a project where I want to match a term against a bigger set of text, not just a few words. Like, a paragraph.

Given the following two strings:

string1 = "I want to eat.";
string2 = "I want to eat. Let's go eat. All this talk about eating is making me hungry. Ready to eat?";

I'd like the term eat to return string2 as higher than string1. However, string1 scores higher:

string1.score('eat');
> 0.5261904761904762

string2.score('eat');
> 0.4477777777777778

Maybe I'm wrong in thinking string2 should score higher, and I'd love to hear arguments for that logic, if that is your logic. Otherwise, any ideas on a more contextual javascript matching algorithm?


Solution

  • If the score is not taking into account repetitions then only one occurrence of "eat" in string2 adds to the score so the other occurrences of "eat" are treated as unmatched garbage which counts against in the total score.

    Many string similarity metrics behave this way, e.g. in Edit distance the more non-matching characters the lower the score and repetitions are treated as non-matching.

    It's not clear to me from reading the source what algo it is using, but the score variables

    var total_character_score = 0,
      start_of_string_bonus,
      abbreviation_score,
      fuzzies=1,
      final_score;
    

    don't seem to take into account multiple repetitions.

    If you want multiple occurrences to count, then it sounds like what you want is not a string-similarity algo, but a fuzzy match algo so you can find the number of matches.

    Maybe yeti witch will work for you.