Search code examples
c#.netstringalgorithmstring-comparison

Compare string similarity


What is the best way to compare two strings to see how similar they are?

Examples:

My String
My String With Extra Words

Or

My String
My Slightly Different String

What I am looking for is to determine how similar the first and second string in each pair is. I would like to score the comparison and if the strings are similar enough, I would consider them a matching pair.

Is there a good way to do this in C#?


Solution

  • static class LevenshteinDistance
    {
        public static int Compute(string s, string t)
        {
            if (string.IsNullOrEmpty(s))
            {
                if (string.IsNullOrEmpty(t))
                    return 0;
                return t.Length;
            }
    
            if (string.IsNullOrEmpty(t))
            {
                return s.Length;
            }
    
            int n = s.Length;
            int m = t.Length;
            int[,] d = new int[n + 1, m + 1];
    
            // initialize the top and right of the table to 0, 1, 2, ...
            for (int i = 0; i <= n; d[i, 0] = i++);
            for (int j = 1; j <= m; d[0, j] = j++);
    
            for (int i = 1; i <= n; i++)
            {
                for (int j = 1; j <= m; j++)
                {
                    int cost = (t[j - 1] == s[i - 1]) ? 0 : 1;
                    int min1 = d[i - 1, j] + 1;
                    int min2 = d[i, j - 1] + 1;
                    int min3 = d[i - 1, j - 1] + cost;
                    d[i, j] = Math.Min(Math.Min(min1, min2), min3);
                }
            }
            return d[n, m];
        }
    }