Search code examples
c#algorithmdistancelevenshtein-distance

Is this Levenshtein Distance algorithm correct?


I have written the algorithm below to compute the Levenshtein distance, and it seems to return the correct results based on my tests. The time complexity is O(n+m), and the space is O(1).

All the existing algorithms I've seen only for this have space complexity O(n*m), as they create a matrix. Is there something wrong with my algorithm?

public static int ComputeLevenshteinDistance(string word1, string word2)
{
    var index1 = 0;
    var index2 = 0;
    var numDeletions = 0;
    var numInsertions = 0;
    var numSubs = 0;
    while (index1 < word1.Length || index2 < word2.Length)
    {
        if (index1 == word1.Length)
        {
            // Insert word2[index2]
            numInsertions++;
            index2++;
        }
        else if (index2 == word2.Length)
        {
            // Delete word1[index1]
            numDeletions++;
            index1++;
        }
        else if (word1[index1] == word2[index2])
        {
            // No change as word1[index1] == word2[index2]
            index1++;
            index2++;
        }
        else if (index1 < word1.Length - 1 && word1[index1 + 1] == word2[index2])
        {
            // Delete word1[index1]
            numDeletions++;
            index1++;
        }
        else if (index2 < word2.Length - 1 && word1[index1] == word2[index2 + 1])
        {
            // Insert word2[index2]
            numInsertions++;
            index2++;
        }
        else
        {
            // Substitute word1[index1] for word2[index2]
            numSubs++;
            index1++;
            index2++;
        }
    }

    return numDeletions + numInsertions + numSubs;
}

Solution

  • Was a comment, but I feel it is probably suitable as an answer:

    Short answer is "no", if you want the true shortest distance for any given inputs.

    The reason your code appears more efficient (and the reason that other implementations create a matrix instead of doing what you're doing) is that your stepwise implementation ignores a lot of potential solutions.

    Examples @BenVoigt gave illustrate this, another perhaps clearer illustration is ("aaaardvark", "aardvark") returns 8, should be 2: it's getting tripped up because it's matching the first a and thinking it can move on, when in fact a more optimal solution would be to consider the first two characters insertions.