Search code examples
stringalignmentsequencebioinformatics

Best Alignment of String B in a Substring of A -- Bioinformatics


I have two strings A and B, let's say

A = AATCGGATATAG
B = CGATA

Some of you may know two types of alignments:

But I would like to implement an alignment that takes the best whole substring of A which, if aligned with B, yields the best alignment

For example:

A,B -- Alignment algorithm --> AATCGGATATAG 
                                  CG-ATA

So far I've been using the Smith-Waterman Algorithm

Does anyone know any suggestions to solve this problem?

Thanks in advance!


Solution

  • Smith-Waterman is still the algorithm you should use. In order to get the full sequence aligned, you should change your gap penalty to 0. This will make S-W favor gaps over mismatches and add as many gaps as are need to include the whole sequence.

    For example setting the gap penalty to 0 using the standard nucleotide 4.4 subsitution matrix will make this alignment:

    A =  AATCGGATATAG
    B =     C-GATA