How can I implement sequence alignment with minimum subsequence length constraint. For example let for these inputs minimum sub-sequence length be 3. Using Smith-Waterman gives output like below.
ATACGGACC
|| |||
ATCATAACC
But instead I need like below.
---ATACGGACC
||| |||
ATCATA---ACC
Is there a know algorithm for this, or do you know an approach?
Thanks in advance.
Look at how Smith-Waterman works. You have two sequences (S1 of length N, S2 of length M) and you create an NxM matrix (let's call it A) where A(i,j) is "the best score alignment of S1[1..i] and S2[1..j]." To do that, you write a recurrance about how to get A(i,j) based on the last element - A(i,j) is the best of
A(i-1, j-1) + match/mismatch score
A(i,j-1) + indel score
A(i-1,j) + indel score
Here's the basic idea; you may need to tweak it a little.
To do what you're asking, you need two matrices. Let A(i,j) be "the best score alignment of S1[1..i] and S2[1..j] ending with a match" and B(i,j) be "the best score alignment of S1[1..i] and S2[1..j] ending with an indel."
Let's start with B, because it's easier. B(i,j) is the best of
A(i,j-1) + indel score
A(i-1,j) + indel score
B(i,j-1) + indel score
B(i-1,j) + indel score
And A(i,j) is the best of
A(i-1, j-1) + match/mismatch score
B(i-3, j-3) + match/mismatch score for the three