I have to make a function that prints the longest palindrome substring of a piece of DNA. I already wrote a function that checks whether a piece of DNA is a palindrome itself. See the function below.
def make_complement_strand(DNA):
complement=[]
rules_for_complement={"A":"T","T":"A","C":"G","G":"C"}
for letter in DNA:
complement.append(rules_for_complement[letter])
return(complement)
def is_this_a_palindrome(DNA):
DNA=list(DNA)
if DNA!=(make_complement_strand(DNA)[::-1]):
print("false")
return False
else:
print("true")
return True
is_this_a_palindrome("GGGCCC")
But now: how to make a function printing the longest palindrome substring of a DNA string?
The meaning of palindrome in the context of genetics is slightly different from the definition used for words and sentences. Since a double helix is formed by two paired strands of nucleotides that run in opposite directions in the 5’- to-3’ sense, and the nucleotides always pair in the same way (Adenine (A) with Thymine (T) for DNA, with Uracil (U) for RNA; Cytosine (C) with Guanine (G)), a (single-stranded) nucleotide sequence is said to be a palindrome if it is equal to its reverse complement. For example, the DNA sequence ACCTAGGT is palindromic because its nucleotide-by-nucleotide complement is TGGATCCA, and reversing the order of the nucleotides in the complement gives the original sequence.
Here, this should be decent starting point for getting longest palindrome substring.
def make_complement_strand(DNA):
complement=[]
rules_for_complement={"A":"T","T":"A","C":"G","G":"C"}
for letter in DNA:
complement.append(rules_for_complement[letter])
return(complement)
def is_this_a_palindrome(DNA):
DNA=list(DNA)
if DNA!=(make_complement_strand(DNA)[::-1]):
#print("false")
return False
else:
#print("true")
return True
def longest_palindrome_ss(org_dna, palindrone_func):
'''
Naive implementation-
We start with 2 pointers.
i starts at start of current subsqeunce and j starts from i+1 to end
increment i with every loop
Uses palindrome function provided by user
Further improvements-
1. Start with longest sequence instead of starting with smallest. i.e. start with i=0 and j=final_i and decrement.
'''
longest_palin=""
i=j=0
last_i=len(org_dna)
while i < last_i:
j=i+1
while j < last_i:
current_subsequence = org_dna[i:j+1]
if palindrone_func(current_subsequence):
if len(current_subsequence)>len(longest_palin):
longest_palin=current_subsequence
j+=1
i+=1
print(org_dna, longest_palin)
return longest_palin
longest_palindrome_ss("GGGCCC", is_this_a_palindrome)
longest_palindrome_ss("GAGCTT", is_this_a_palindrome)
longest_palindrome_ss("GGAATTCGA", is_this_a_palindrome)
Here are some executions -
mahorir@mahorir-Vostro-3446:~/Desktop$ python3 dna_paln.py
GGGCCC GGGCCC
GAGCTT AGCT
GGAATTCGA GAATTC