Search code examples
biopythonfastaioerror

IOError while retrieving sequences from fasta file using biopython


I have a fasta file containning PapillomaViruses sequences (entire genomes, partial CDS, ....) and i'm using biopython to retrieve entire genomes (around 7kb) from this files, so here's my code:

rec_dict = SeqIO.index("hpv_id_name_all.fasta","fasta")

for k in rec_dict.keys():

    c=c+1

    if len(rec_dict[k].seq)>7000:

        handle=open(rec_dict[k].description+"_"+str(len(rec_dict[k].seq))+".fasta","w")

        handle.write(">"+rec_dict[k].description+"\n"+str(rec_dict[k].seq)+"\n")

        handle.close()

i'm using a dictionary for avoiding loading everything in memory. The variable "c" is used to know how many iterations are made before THIS error pops up:

Traceback (most recent call last):

File "<stdin>", line 4, in <module>

IOError: [Errno 2] No such file or directory: 'EU410347.1|Human papillomavirus FA75/KI88-03_7401.fasta'

when i print the value of "c", i get 9013 while the file contains 10447 sequences, meaning the for loop didn't go through all the sequences (the count is done before the "if" condition, so the i count all the iterations, not only those which match the condition). i don't understand the INPUT/OUTPUT error, it should create the 'EU410347.1|Human papillomavirus FA75/KI88-03_7401.fasta' file instead of verifying its existence, shouldn't it?


Solution

  • The file you were trying to create -- 'EU410347.1|Human papillomavirus FA75/KI88-03_7401.fasta' -- contains a slash ('/'), which is interpreted by Python as a directory 'EU410347.1|Human papillomavirus FA75' followed by a filename 'KI88-03_7401.fasta', so Python complains that the directory does not exist.

    You may want to replace the slash with something else, such as

    handle=open(rec_dict[k].description.replace('/', '_')+"_"+str(len(rec_dict[k].seq))+".fasta","w")