Search code examples
pythonpython-3.xdata-sciencebioinformaticsdna-sequence

python and user defined functions


I was learning python coding and was using a function for calculating the gc percentage in a DNA sequence with undefined character N or n (NAAATTTGGGCCCN) and this created the following problem. is there a way to overcome this ?

def gc(sequence) :
    "This function computes the GC percentage of a dna sequence"
    nbases=sequence.count('n')+sequence.count('N')
    gc_count=sequence.count('c')+sequence.count('C')+sequence.count('g')+sequence.count('G')      #total gc count
    gc_percent=float(gc_count)/(len(sequence-nbases))     # TOTAL GC COUNT DIVIDED BY TOTAL LEN OF THE sequence-TOTAL NO. OF N
    return 100 * gc_percent

Solution

  • def GC_content(dnaseq):
        percent = round(((dnaseq.count("C") + dnaseq.count("G")) / len(dnaseq)) * 100, 3)
        print(f'GC content: {percent} %')
    

    Here is a code I had laying around for the same thing. But I had mine round to 3 decimal places just for consistency in my program. And I would just put something like sequence.upper() so you avoid having to hard code lower and upper-case letters.