Gusfield (Algorithms on Strings, Trees, and Sequences, Section 11.8.6) describes a dynamic programming algorithm for finding the best alignment between two sequences A and B under the assumption that the penalty assigned to a gap of length x in one of the aligned sequences is of the form R+Sx for constants R and S. In the special case where S=0, there is a penalty for beginning a gap but no penalty for lengthening it. It seems to me that there is an error in Gusfield's formulae and I hope that someone can help me understand the true state of affairs.
Gusfield defines four functions V(i,j), G(i,j), E(i,j) and F(i,j) with the following properties:
The recurrences for i and j greater than or equal to 1 are:
V(i,j)=max[E(i,j),F(i,j),G(i,j)]
G(i,j)=V(i,j)+w(A[i],B[j]) where w is the score for a match/mismatch.
E(i,j)=max(E(i,j-1),V(i,j-1)-R]
F(i,j)=max(F(i-1,j),V(i-1,j)-R]
Finally the boundary conditions are given as:
V(i,0)=E(i,0)=-R
V(0,j)=F(0,j)=-R
With all that as preliminaries, consider the situation where we have two sequences each of length one, so that A=p and B=q. There are only three possible alignments in this situation:
p p- -p
q -q q-
These have scores respectively w(p,q), -2R, -2R. In particular we should have E(0,1)=F(1,0)=-2R. However, the recurrences give that E(0,1) and F(1,0) are both bigger than or equal to -R.
This error in the boundary conditions has consequences. Suppose for example that A=pppppp...p and B=qqqq...q with p and q different. The alignment that sets A completely off from B:
A-
-B
should score as -2R (it has two gaps) and so this alignment is eventually optimal assuming w(p,q)<0.
Gusfield's algorithm does not seem to handle this case correctly.
In practical situations this probably doesn't matter, because typical strings have a lot of matches and so the boundary cases don't contribute to the solution.
Comments/criticisms welcome.
Yes, two of the equations are actually incorrect. They should be
F(i,j)=max(F(i,j-1),V(i,j-1)-R]
E(i,j)=max(E(i-1,j),V(i-1,j)-R]
Since E(i,j) is the best possible score for alignments of these prefixes under the condition that A[i] is paired against a gap in B, the alignment is made of the optimal alignment of A[1:i-l] against B[1:j] and an l-long gap (the indels are against A[i-l+1:i]).