Search code examples
c++algorithmlevenshtein-distanceedit-distance

Levenshtein Edit Distance is not calculating edit distance


I am trying to get my Levenshtein Edit Distance algorithm working but for some reason, the number of edits is coming out incorrect. I can't see where my mistake is and I was wondering if someone see's what I am doing incorrectly.

Input

5
ATCGTT
AGTTAC
ACGAAT
CCGTAAAT
TTACGACCAGT

expected output

Strand A: ATCGTT--
Strand B: A--GTTAC
Edit Distance: 4

Strand A: ATCG-TT
Strand B: A-CGAAT
Edit Distance: 3

Strand A: ATCGT---T
Strand B: -CCGTAAAT
Edit Distance: 5

Strand A: AT-CG----TT
Strand B: TTACGACCAGT
Edit Distance: 7

Strand A: AGTTAC
Strand B: ACGAAT
Edit Distance: 4

Strand A: -AGT-TAC
Strand B: CCGTAAAT
Edit Distance: 5

Strand A: --A-G-TTA-C
Strand B: TTACGACCAGT
Edit Distance: 8

Strand A: ACG--AAT
Strand B: CCGTAAAT
Edit Distance: 3

Strand A: --ACGA--A-T
Strand B: TTACGACCAGT
Edit Distance: 5

Strand A: --CCG-TAAAT
Strand B: TTACGACCAGT
Edit Distance: 7

my output

Strand A: ATCGT-
Strand B: AGTTAC
Edit Distance: 5

Strand A: ATC-T-
Strand B: ACGAAT
Edit Distance: 5

Strand A: ATC-T-
Strand B: CCGTAAAT
Edit Distance: 5

Strand A: A-C-T-
Strand B: TTACGACCAGT
Edit Distance: 10

Strand A: AGTTAC
Strand B: ACGAAT
Edit Distance: 5

Strand A: AG-TAC
Strand B: CCGTAAAT
Edit Distance: 6

Strand A: A--T-C
Strand B: TTACGACCAGT
Edit Distance: 7

Strand A: AC-AAT
Strand B: CCGTAAAT
Edit Distance: 7

Strand A: AC---T
Strand B: TTACGACCAGT
Edit Distance: 8

Strand A: CC-TAAAT
Strand B: TTACGACCAGT
Edit Distance: 8

findEditDistance

void EditDistance::findEditDistance()
{
    int upperValue, leftValue, diagonalValue;

    for (int i = 0; i < mLengthX; ++i)
    {
        table[i][0].stringLength = i;
    }

    for (int i = 0; i < mLengthY; ++i)
    {
        table[0][i].stringLength = i;
    }

    for (int i = 1; i < mLengthX; ++i)
    {
        for (int j = 1; j < mLengthY; ++j)
        {
            if (mStringX[i] == mStringY[j])
            {
                table[i][j].direction = DIAGONAL;
                table[i][j].stringLength = table[i - 1][j -1].stringLength;
            }
            else
            {
                upperValue = table[i - 1][j].stringLength;
                leftValue = table[i][j - 1].stringLength;
                diagonalValue = table[i - 1][j - 1].stringLength;

                if (upperValue < leftValue)
                {
                    if (upperValue < diagonalValue)
                    {
                        //upper is the lowest
                        table[i][j].stringLength = table[i - 1][j].stringLength + 1;
                        table[i][j].direction = UP;
                    }
                    else
                    {
                        //diagonal is lowest
                        table[i][j].stringLength = table[i - 1][j -1].stringLength + 1;
                        table[i][j].direction = DIAGONAL;
                    }
                }
                else if (leftValue < diagonalValue)
                {
                    //left is lowest
                    table[i][j].stringLength = table[i][j - 1].stringLength + 1;
                    table[i][j].direction = LEFT;
                }
                else
                {
                    //diagonal is lowest
                    table[i][j].stringLength = table[i - 1][j -1].stringLength + 1;
                    table[i][j].direction = DIAGONAL;
                }
            }
        }
    }   
}

getDistance

void EditDistance::getDistance()
{
    int i = mStringX.length() - 1;
    int j = mStringY.length() - 1;

    numEdits = 0;

    updateStrands (i, j);
}

updateStrands

void EditDistance::updateStrands (int i, int j)
{
    if (i == 0 || j == 0)
    {
        return;
    }

    if (table[i][j].direction == DIAGONAL)
    {
        ++numEdits;
        updateStrands (i - 1, j - 1);
    }
    else if (table[i][j].direction == UP)
    {
        mStringY[j] = '-';
        ++numEdits;
        updateStrands (i - 1, j);
    }
    else
    {
        mStringX[i] = '-';
        ++numEdits;
        updateStrands (i, j - 1);
    }
}

Solution

  • The problem with the edit distance is in your updateStrands. It counts diagonal moves as 1, when in fact a diagonal move can have a distance of 1 (substitution) or 0 (match). You could fix this in updateStrands, but there's really no need to do the calculation there at all, when the number is already in the lower-right corner of table.

    If you want the correct "strands" (e.g. "ATCGTT--" and "A--GTTAC"), you will have to make corrections in updateStrands (you change elements of the strings where you should insert), getDistance (you start in the wrong place) and findEditDistance (you neglect to assign values to direction along the upper and left edges when you set stringLength to i).