I was learning python coding and was using a function for calculating the gc percentage in a DNA sequence with undefined character N or n (NAAATTTGGGCCCN) and this created the following problem. is there a way to overcome this ?
def gc(sequence) :
"This function computes the GC percentage of a dna sequence"
nbases=sequence.count('n')+sequence.count('N')
gc_count=sequence.count('c')+sequence.count('C')+sequence.count('g')+sequence.count('G') #total gc count
gc_percent=float(gc_count)/(len(sequence-nbases)) # TOTAL GC COUNT DIVIDED BY TOTAL LEN OF THE sequence-TOTAL NO. OF N
return 100 * gc_percent
def GC_content(dnaseq):
percent = round(((dnaseq.count("C") + dnaseq.count("G")) / len(dnaseq)) * 100, 3)
print(f'GC content: {percent} %')
Here is a code I had laying around for the same thing. But I had mine round to 3 decimal places just for consistency in my program. And I would just put something like sequence.upper()
so you avoid having to hard code lower and upper-case letters.