OK this is what I want to do:
Get more than two strings and "align" them (no DNA/RNA sequence or the like, just regular strings with not like 1000 items in each of them)
I've already done some work with pairwise alignment (align two strings) however the "gaps" create some issues for me when trying to align more than one pair.
Example (one I'm currently testing):
ABCDEF
ABGHCEEF
AJKLBCDYEOF
AB--CDEF
ABGHCEEF
=======================
AB--C-EF
A-B--C--E-F
AJKLBCDYEOF
=======================
A----C--E-F
And another (more illustrative) example :
http://nest.drkameleon.com
http://www.google.com
http://www.yahoo.com
http://nest.drkameleon.com
http://-www.--google--.com
=======================
http://----.------le--.com
http://----.------le--.com
http://-www.-----yahoo.com
=======================
http://----.----------.com
What I'm currently doing :
R1
)R1
and C
(result in R2
)R2
and D
So what's in your mind? How could I go for that? Is there a better way? (Of course, there must be...)
I'd rather do that in Perl/Python or something along these lines, however any type of code/reference would be more than welcome! :-)
I think you may be able to cast this problem as a more general string diff problem instead of a string alignment. Consider how GNU diff
is used for finding differences between two files, and use the same algorithms as are used to perform an N-way diff
.
I'm not sure if the time/memory complexity of this approach is amenable to your needs, but you can at least think about the problem this way.