I am a newbie in Perl (Regular Expressions). I need aa example on how to write a program for finding out palindromes (perfect) in more than one protein sequences (let it be 4 sequence with 200 amino acids in number, in a file) I have to filter out, the palindromes and the position of palindromes present in the sequences.
>TRE|Q47404|Q47404 (409 AA) Glycosyl transferase [Escherichia coli]
MIFDASLKKLRKLFVNPIGFFRDSWFFNSKNKAEELLSPLKIKSKNIFIVAHLGQLKKAE
LFIQKFSRRSNFLIVLATKKNTEMPRLILEQMNKKLFSSYKLLFIPTEPNTFSLKKVIWF
YNVYKYIVLNSKAKDAYFMSYAQHYAIFIWLFKKNNIRCSLIEEGTGTYKTEKKKPLVNI
NFYSWIINSIILFHYPDLKFENVYGTFPNLLKEKFDAKKIFEFKTIPLVKSSTRMDNLIH
>TRE|O06435|O06435 (492 AA) SynE [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNN
LLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTI
QPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNN
LHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
IFVSQRYPVSDEVYYKTIVETLNQMSLRIEGKIFIKLHPKEMENKNIMSLFLNMVTINPR
>TRE|Q8VRL9|Q8VRL9 (492 AA) SiaD [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNN
LLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTI
QPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNN
LHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
I need the output of perfect palindromes in this and also their positions. I have gone through many articles, but couldn't get any better idea. Please suggest me some techniques and programs for this.
There are three regex features that are required for this challenge:
perlretut - Recursive Patterns — To find palindromes
perlretut - Positive Lookahead Assertions — To find matches that overlap
perlretut - Position Information — To determine where the matches are in the string.
Putting these together gives this result:
use strict;
use warnings;
my $pp = qr/(?: (\w) (?1) \g{-1} | \w? )/ix;
local $/ = '';
while (<DATA>) {
chomp;
my ($header, @lines) = split "\n";
my $data = join '', @lines;
print "$header\n$data\n";
while ($data =~ /(?=($pp))/g) {
print "$-[0] - $1\n" if length($1) > 2;
}
}
__DATA__
>TRE|Q47404|Q47404 (409 AA) Glycosyl transferase [Escherichia coli]
MIFDASLKKLRKLFVNPIGFFRDSWFFNSKNKAEELLSPLKIKSKNIFIVAHLGQLKKAE
LFIQKFSRRSNFLIVLATKKNTEMPRLILEQMNKKLFSSYKLLFIPTEPNTFSLKKVIWF
YNVYKYIVLNSKAKDAYFMSYAQHYAIFIWLFKKNNIRCSLIEEGTGTYKTEKKKPLVNI
NFYSWIINSIILFHYPDLKFENVYGTFPNLLKEKFDAKKIFEFKTIPLVKSSTRMDNLIH
>TRE|O06435|O06435 (492 AA) SynE [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNN
LLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTI
QPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNN
LHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
IFVSQRYPVSDEVYYKTIVETLNQMSLRIEGKIFIKLHPKEMENKNIMSLFLNMVTINPR
>TRE|Q8VRL9|Q8VRL9 (492 AA) SiaD [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNN
LLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTI
QPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNN
LHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
Outputs:
>TRE|Q47404|Q47404 (409 AA) Glycosyl transferase [Escherichia coli]
MIFDASLKKLRKLFVNPIGFFRDSWFFNSKNKAEELLSPLKIKSKNIFIVAHLGQLKKAELFIQKFSRRSNFLIVLATKKNTEMPRLILEQMNKKLFSSYKLLFIPTEPNTFSLKKVIWFYNVYKYIVLNSKAKDAYFMSYAQHYAIFIWLFKKNNIRCSLIEEGTGTYKTEKKKPLVNINFYSWIINSIILFHYPDLKFENVYGTFPNLLKEKFDAKKIFEFKTIPLVKSSTRMDNLIH
6 - LKKL
29 - KNK
40 - KIK
42 - KSK
46 - IFI
66 - SRRS
86 - LIL
123 - YKY
131 - KAK
146 - IFI
164 - GTG
165 - TGT
172 - KKK
178 - NIN
211 - KEK
220 - FEF
>TRE|O06435|O06435 (492 AA) SynE [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNNLLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTIQPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNNLHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDYIFVSQRYPVSDEVYYKTIVETLNQMSLRIEGKIFIKLHPKEMENKNIMSLFLNMVTINPR
26 - FSSF
55 - KLK
70 - MKM
114 - KLLK
135 - SLLS
137 - LSL
154 - TAT
205 - NAN
220 - STS
222 - SQS
271 - KIFIK
272 - IFI
280 - EME
283 - NKN
289 - LFL
>TRE|Q8VRL9|Q8VRL9 (492 AA) SiaD [Neisseria meningitidis]
MLQKIRKALFHPKKFFQDSQWFATPLFSSFAPKSNLFIISTFAQLNQAHSLTKMQKLKNNLLVILYTTQNMKMPKLIQKSVDKELFSVTYMFELPRKPGIVSPKKFLYIQRGYKKLLKTIQPAHLYVMSFAGHYSSLLSLAKKMNITTHLVEEGTATYAPLLESFTYKPTKFEQRFVGNNLHQKGYFDKFDILHVAFPEYAKKIFNANEYHRFFAHSGGISTSQSIAKIQDKYRISQNDY
26 - FSSF
55 - KLK
70 - MKM
114 - KLLK
135 - SLLS
137 - LSL
154 - TAT
205 - NAN
220 - STS
222 - SQS