So I have a DNA sequence
DNA = "TANNNT"
where N = ["A", "G", "C", "T"]
I want to have all possible output of TAAAAT, TAAAGT, TAAACT, TAAATT.....
and so on.
Right now from online I found solution of permutations where I can do
perms = [''.join(p) for p in permutations(N, 3)]
then just iterate my DNA sequence as
TA + perms + T
but I wonder if there is easier way to do this, because I have a lot more DNA sequences and make take a lot more time to hard code it.
Edit:
The hard code part will be as in I would have to state
N1 = [''.join(p) for p in permutations(N, 1)]
N2 = [''.join(p) for p in permutations(N, 2)]
N3 = [''.join(p) for p in permutations(N, 3)]
then do for i in N3:
key = "TA" + N3[i] + "T"
Since my sequence is quite long, I don't want count how many consecutive N I have in the sequence and want to see if there is better way to do this.
You can use your permutation results to format a string like:
Code:
import itertools as it
import re
def convert_sequence(base_string, target_letter, perms):
REGEX = re.compile('(%s+)' % target_letter)
match = REGEX.search(base_string).group(0)
pattern = REGEX.sub('%s', base_string)
return [pattern % ''.join(p) for p in it.permutations(perms, len(match))]
Test Code:
print(convert_sequence('TANNNT', 'N', ['A', 'G', 'C', 'T']))
Results:
['TAAGCT', 'TAAGTT', 'TAACGT', 'TAACTT', 'TAATGT',
'TAATCT', 'TAGACT', 'TAGATT', 'TAGCAT', 'TAGCTT',
'TAGTAT', 'TAGTCT', 'TACAGT', 'TACATT', 'TACGAT',
'TACGTT', 'TACTAT', 'TACTGT', 'TATAGT', 'TATACT',
'TATGAT', 'TATGCT', 'TATCAT', 'TATCGT']