Search code examples
pythonbioinformaticsgenome

How to get fragments from a DNA sequence


I want to cut a DNA genome into any k-mer size, so I created the function Sliding_DNA(dna_list,size_to_split) but I doesn't work.

Can somebody help me!

When I print out the variable pedazos, it gives me the following:

'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC', 'CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC', 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA', 'TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT', 'TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT']

Code:

def Sliding_DNA(dna_list,size_to_split):

# range por el que va a slide

#vecesRecorrer = int(len(dna_list) / 500)

lista_temp = []


#dna_to_split = dna_list[0]

#print(dna_to_split)

posiInicial = 0

posiFinal = 0

test = 'AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACCATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGGGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTGAAAAAACCATT'

for nucleotide in test:

    pedazo = ""

    posiFinal = posiInicial + size_to_split

    for posiInicial in xrange(posiFinal):

        pedazo += nucleotide

        if len(pedazo)==size_to_split:

            lista_temp.append(pedazo)

    posiInicial += size_to_split


return lista_temp


pedazos = Sliding_DNA(dna_list,100)

Solution

  • Problem is because of this,

    pedazo += posiInicial
    

    You assigned empty string to pedazo variable, so it's a string. posiInicial variable contains integer. So python confuses on concatenating or doing + on string and integer.

    So change the value of pedazo to 0

    pedazo = 0
    
    cont += 1
    
    posiFinal = posiInicial + 500
    
    for posiInicial in xrange(posiFinal):
    
        pedazo += posiInicial