Search code examples
c#.netstringsimilarity

How can I calculate similarity between two strings in C#?


I'm looking to assess similarity (including case) between two strings and give a value between 0 and 1.

I tried the Levenshtein distance implementation but it only gives integers and does not compare inner alphabets.

For e.g. comparing "ABCD" and "Abcd" gives distance of 3 and "AOOO" also gives a distance of 3 but clearly "Abcd" is better match than "AOOO".

So compared to "ABCD" I want "ABcd" to be most similar then "Abcd" then "AOOO" then "AOOOO"

I've also looked here but I am not looking for a variable length algorithm.

Thanks


Solution

  • Try something like this

    double d = (LevenshteinDist(s, t) + LevenshteinDist(s.ToLower(), t.ToLower())) /
               2.0 * Math.Max(s.Length, t.Length);
    

    If you want to give less importance to case differences than letter differences, you can give different weights to the terms

    double d = (0.15*LevenshteinDist(s, t) + 
                0.35*LevenshteinDist(s.ToLower(), t.ToLower())) /
               Math.Max(s.Length, t.Length);
    

    Note that the weights sum up to 0.5, thus makting the division by 2.0 obsolete.