Search code examples
algorithmalignmentbioinformaticsstring-matching

Find optimal local alignment of two strings using local & global alignments


I have a homework question that I trying to solve for many hours without success, maybe someone can guide me to the right way of thinking about it.

The problem:

We want to find an optimal local alignment of two strings S1 and S2, we know that there exists such an alignment with the two aligned substrings of S1 and S2 both of length at most q. Besides, we know that the number of the table cells with the maximal value, opt, is at most r. Describe an algorithm solving the problem in time O(mn+r*q^2) using working space of at most O(n+r+q^2).

Restrictions: run the algorithm of finding the optimal local alignment value, with additions to your choice (like the list of index pairs), only once. Besides, you can run any variant of the algorithm for solving the global optimal alignment problem as many times as you wish

I know the solution to this problem with running the local alignment many times and the global alignment only once, but not the opposite.

the global alignment algorithm: enter image description here

the local alignment algorithm: enter image description here

Any help would be appreciated.


Solution

  • The answer in case someone will be interested in this question in the future:

    1. Compute the optimal local alignment score OPT of both strings in $O(mn)$ time and $O(n)$ space by maintaining just a single row of the DP matrix. (Since we are only computing the score and don't need to perform traceback to build the full solution, we don't need to keep the full DP matrix.) As you do so, keep track of the highest cell value seen so far, as well as a list of the coordinates $(i, j)$ of cells having this value. Whenever a new maximum is seen, clear the list and update the maximum. Whenever a cell $\ge$ the current maximum is seen (including in the case where we just saw a new maximum), add the coordinates of the current cell to the list. At the end, we have a list of endpoints of all optimal local alignments; by assumption, there are at most $r$ of these.
    2. For each entry $(i, j)$ in the list:
      1. Set $R1$ to the reverse of the substring $S1[i-q+1..i]$, and $R2$ to the reverse of $S2[j-q+1..j]$.
      2. Perform optimal global alignment of $R1$ and $R2$ in $O(q^2)$ time, maintaining the full $O(q^2)$ DP matrix this time.
      3. Search for the highest entry in the matrix (also $O(q^2)$ time; or you can perform this during the previous step).
      4. If this entry is OPT, we have found a solution: Trace back towards the top-left corner from this cell to find the full solution, reverse it, and output it, and stop.

    By assumption, at least one of the alignments performed in the previous step reaches a score of OPT. (Note that reversing both strings does not change the score of an alignment.)

    Step 2 iterates at most $r$ times, and does $O(q^2)$ work each time, using at most $O(q^2)$ space, so overall the time and space bounds are met.

    (A simpler way, that avoids reversing strings, would be to simply perform local alignments of the length-$q$ substrings, but the restrictions appear to forbid this.)