I have a fasta file containning PapillomaViruses sequences (entire genomes, partial CDS, ....) and i'm using biopython to retrieve entire genomes (around 7kb) from this files, so here's my code:
rec_dict = SeqIO.index("hpv_id_name_all.fasta","fasta")
for k in rec_dict.keys():
c=c+1
if len(rec_dict[k].seq)>7000:
handle=open(rec_dict[k].description+"_"+str(len(rec_dict[k].seq))+".fasta","w")
handle.write(">"+rec_dict[k].description+"\n"+str(rec_dict[k].seq)+"\n")
handle.close()
i'm using a dictionary for avoiding loading everything in memory. The variable "c" is used to know how many iterations are made before THIS error pops up:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
IOError: [Errno 2] No such file or directory: 'EU410347.1|Human papillomavirus FA75/KI88-03_7401.fasta'
when i print the value of "c", i get 9013 while the file contains 10447 sequences, meaning the for loop didn't go through all the sequences (the count is done before the "if" condition, so the i count all the iterations, not only those which match the condition). i don't understand the INPUT/OUTPUT error, it should create the 'EU410347.1|Human papillomavirus FA75/KI88-03_7401.fasta' file instead of verifying its existence, shouldn't it?
The file you were trying to create -- 'EU410347.1|Human papillomavirus FA75/KI88-03_7401.fasta' -- contains a slash ('/'), which is interpreted by Python as a directory 'EU410347.1|Human papillomavirus FA75' followed by a filename 'KI88-03_7401.fasta', so Python complains that the directory does not exist.
You may want to replace the slash with something else, such as
handle=open(rec_dict[k].description.replace('/', '_')+"_"+str(len(rec_dict[k].seq))+".fasta","w")