I've been using String Score for a lot of projects. It's great for sorting lists, like names, countries, etc.
Right now, I'm working on a project where I want to match a term against a bigger set of text, not just a few words. Like, a paragraph.
Given the following two strings:
string1 = "I want to eat.";
string2 = "I want to eat. Let's go eat. All this talk about eating is making me hungry. Ready to eat?";
I'd like the term eat
to return string2
as higher than string1
. However, string1
scores higher:
string1.score('eat');
> 0.5261904761904762
string2.score('eat');
> 0.4477777777777778
Maybe I'm wrong in thinking string2
should score higher, and I'd love to hear arguments for that logic, if that is your logic. Otherwise, any ideas on a more contextual javascript matching algorithm?
If the score
is not taking into account repetitions then only one occurrence of "eat"
in string2
adds to the score so the other occurrences of "eat"
are treated as unmatched garbage which counts against in the total score.
Many string similarity metrics behave this way, e.g. in Edit distance the more non-matching characters the lower the score and repetitions are treated as non-matching.
It's not clear to me from reading the source what algo it is using, but the score variables
var total_character_score = 0,
start_of_string_bonus,
abbreviation_score,
fuzzies=1,
final_score;
don't seem to take into account multiple repetitions.
If you want multiple occurrences to count, then it sounds like what you want is not a string-similarity algo, but a fuzzy match algo so you can find the number of matches.
Maybe yeti witch will work for you.