Search code examples
stringperldna-sequence

align 2 DNA sequences and find complementary regions


I tried to find some clue in the list but I couldn't, so sorry if I ask a repeated topic

I am PERL beginner and I am trying to write a program in PERL that take two DNA sequences, calculate the reverse of the second one and find the maximum complementary regions between them, that is:

input:

CGTAAATCTATCTT
CATGCGTCTTTACG

output:

CGTAAATCTATCTT
GCATTT--------

I have no problem to find the reverse of the second sequence, however my programming skills in PERL are rudimentary. Do I need to use a combined for an foreach loops?


Solution

  • Perhaps this is what you want (crudely):

    #!/usr/bin/env perl
    use strict;
    use warnings;
    die unless @ARGV == 2 && length $ARGV[0] == length $ARGV[1];
    my @seq1 = split //, $ARGV[0];
    my @seq2 = split //, reverse $ARGV[1];
    my @comp;
    for my $n (0..@seq1-1) {
        if   ( ($seq1 [$n] eq 'A' && $seq2 [$n] eq 'T') 
            || ($seq1 [$n] eq 'T' && $seq2 [$n] eq 'A') 
            || ($seq1 [$n] eq 'G' && $seq2 [$n] eq 'C') 
            || ($seq1 [$n] eq 'C' && $seq2 [$n] eq 'G') ) {
            push @comp, $seq2[$n];
        }
        else {
            push @comp, '-';
        }
    }
    print @seq1, "\n", @comp, "\n";
    

    ...which when run:

    # ./compseq CGTAAATCTATCTT CATGCGTCTTTACG
    CGTAAATCTATCTT
    GCATTT------A-