Search code examples
javalevenshtein-distance

Levenshtein distance in java outputting the wrong number


For my university assignment in java I have been asked to provide "extra analytics functions" I decided to use Levenshtein distance but I have an issue where the number outputted to the console is one less than the actual answer. So the distance between "cat" and "hat" should be 1 but it's displaying as 0

public class Levenshtein {

public Levenshtein(String first, String second) {

    char [] s = first.toCharArray();
    char [] t = second  .toCharArray();
    int Subcost = 0;

    int[][] array = new int[first.length()][second.length()];

    for (int i = 0; i < array[0].length; i++)
    {
        array[0][i] = i;
    }

    for (int j = 0; j < array.length; j++)
    {

        array [j][0]= j;
    }

    for (int i = 1; i < second.length(); i++)
    {
        for (int j = 1; j < first.length(); j++)
        {
            if (s[j] == t [i])
            {
                Subcost = 0;
            }
            else
            {
                Subcost = 1;
            }

            array [j][i] = Math.min(array [j-1][i] +1,
                    Math.min(array [j][i-1] +1,
                            array [j-1][i-1] + Subcost) );
        }
    }

    UI.output("The Levenshtein distance is -> " + array[first.length()-1][second.length()-1]);

}

}


Solution

  • Apparently you're using the following algorithm:

    https://en.wikipedia.org/wiki/Levenshtein_distance#Iterative_with_full_matrix

    I think you were not too accurate with indices. I'm not sure where exactly the problem is, but here is a working version:

    public int calculateLevenshteinDistance(String first, String second) {
    
        char[] s = first.toCharArray();
        char[] t = second.toCharArray();
        int substitutionCost = 0;
    
        int m = first.length();
        int n = second.length();
    
        int[][] array = new int[m + 1][n + 1];
    
        for (int i = 1; i <= m; i++) {
            array[i][0] = i;
        }
    
        for (int j = 1; j <= n; j++) {
    
            array[0][j] = j;
        }
    
        for (int j = 1; j <= n; j++) {
            for (int i = 1; i <= m; i++) {
                if (s[i - 1] == t[j - 1]) {
                    substitutionCost = 0;
                } else {
                    substitutionCost = 1;
                }
    
                int deletion = array[i - 1][j] + 1;
                int insertion = array[i][j - 1] + 1;
                int substitution = array[i - 1][j - 1] + substitutionCost;
                int cost = Math.min(
                        deletion,
                        Math.min(
                                insertion,
                                substitution));
                array[i][j] = cost;
            }
        }
    
        return array[m][n];
    }