Search code examples
levenshtein-distance

Showing the result of Levenshtein Distance


Given two strings (s1, s2), Levenshtein Distance is the minimum number of operations needed to change s1 to s2 or vice versa.

I want to show the result of changing s1 to s2. For example, changing Sunday to Saturday needs 3 operations. I need to show S++u+day. "+" is for each operations needed.


Solution

  • Here is a javascript snippet that returns what you want. If you are familiar with the dynamic programming algorithm you should be able follow this code. All the string operations/manipulation of the return string r and handling of min/curMin are what's changed from the original version.

    function edits(t, s) {
        var r = "";
        if (s === t) {
            return s;
        }
        var n = s.length, m = t.length;
        if (n === 0 || m === 0) {
            return "+".repeat(n + m);
        }
        var x, y, a, b, c, min = 0;
        var p = new Array(n);
        for (y = 0; y < n;)
            p[y] = ++y;
        for (x = 0; x < m; x++) {
            var e = t.charCodeAt(x);
            c = x;
            b = x + 1;
            var currMin = min;
            min = n + 1;
            for (y = 0; y < n; y++) {
                a = p[y];
                if (a < c || b < c) {
                    b = (a > b ? b + 1 : a + 1);
                }
                else {
                    if (e !== s.charCodeAt(y)) {
                        b = c + 1;
                    }
                    else {
                        b = c;
                    }
                }
                if (b < min) {
                    min = b;
                }
                p[y] = b;
                c = a;
            }
            if (min > currMin) {
                r += "+";
            }
            else {
                r += t[x];
            }
        }
        return r;
    }
    

    EDIT: The implementation above is an version optimized for speed and space so might be harder to read. The implemetation below is a modified version of the JS version from Wikipedia and should be easier to follow.

    function getEditDistance(a, b) {
        if(a.length === 0) return "+".repeat(b.length);
        if(b.length === 0) return "+".repeat(a.length);
    
        var matrix = [];
    
        // increment along the first column of each row
        var i;
        for(i = 0; i <= b.length; i++){
            matrix[i] = [i];
        }
    
        // increment each column in the first row
        var j;
        for(j = 0; j <= a.length; j++){
            matrix[0][j] = j;
        }
    
        var r = "", min = 0;;
    
        // Fill in the rest of the matrix
        for(i = 1; i <= b.length; i++){
    
            var currMin = min;
            min = a.length + 1;
    
            for(j = 1; j <= a.length; j++){
    
                if(b.charAt(i-1) == a.charAt(j-1)){
                    matrix[i][j] = matrix[i-1][j-1];
                } else {
                    matrix[i][j] = Math.min(matrix[i-1][j-1] + 1, // substitution
                        Math.min(matrix[i][j-1] + 1, // insertion
                            matrix[i-1][j] + 1)); // deletion
                }
    
                if (matrix[i][j] < min) {
                    min = matrix[i][j];
                }
            }
    
            if (min > currMin) {
                r += "+";
            }
            else {
                r += b[i-1];
            }
        }
        return r;
    }
    

    EDIT2: Added explanation of the algorithm and example output

    Below is the levenshtein matrix from the input strings "kitten" and "sitting". What I changed in the original algorithm is that I added a check if the current rows minimum value is larger then the previous rows minimum, and if it is, there is an edit in the current row and thus adding a "+". If not and the " edit cost" for the current row is the same as the previous. Then there is no edit necessary and we just add the current character to the result string. You can follow the algorithm row by row in the image (starting at row 1, and column 1)

    enter image description here

    Examples:

    > getEditDistance("kitten", "sitting");
    '+itt+n+'
    > getEditDistance("Sunday", "Saturday");
    'S++u+day'