I have a file with several names such :
seq1 seq9 seq3 seq7 seq5 seqi seqn....
and another fasta file with all my sequences, and what I need to do is to order my sequences by the order of the list above:such as:
>seq1
aaaaa
>seq9
aaaaa
>seq3
aaaaa
>seq7
aaaaa
>seq5
aaaaa
...
I tried this:
input_file = open('concatenate_0035_0042_aa2.fa','r')
output_file = open('result.fasta','a')
liste=['seq1','seq5','seq8' etc]
print(len(liste))
compteur=1
for i in liste:
record_dict = SeqIO.to_dict(SeqIO.parse("concatenate_0035_0042_aa2.fa", "fasta"))
print(">",record_dict[i].id,file=output_file,sep="")
print(record_dict[i].seq,file=output_file)
compteur+=1
print(compteur)
output_file.close()
input_file.close()
but it actually takes too much time.
The reason that your current code takes too much time, is because for each sequence id in your list, you parse your fasta file and convert it to a dict. Certainly if your fasta file is large, this is an expensive computation. So only do it once:
from Bio import SeqIO
ids = ['seq1', 'seq9', 'seq3', 'seq7', 'seq5']
with open('concatenate_0035_0042_aa2.fa') as seqs, open('result.fasta', 'w') as result:
record_dict = SeqIO.to_dict(SeqIO.parse(seqs, 'fasta'))
result_records = [record_dict[id_] for id_ in ids]
SeqIO.write(result_records, result, "fasta")
the with open(...)
statement takes care of automatically closing the file for you when you're done.