def batch_iterator(iterator, batch_size) :
entry = True
while entry :
batch = []
while len(batch) < batch_size :
try :
entry = iterator.__next__
except StopIteration :
entry = None
if entry is None :
#End of file
break
batch.append(entry)
if batch :
yield batch
from Bio import SeqIO
record_iter = SeqIO.parse(open("C:\\Users\\IDEAPAD\Desktop\\fypsplit\\protein.fasta"),"fasta")
for i, batch in enumerate(batch_iterator(record_iter, 1000)):
filename = "group_%i.fasta" % (i + 1)
with open(filename, "w") as handle:
count = SeqIO.write(batch, handle, "fasta")
print("Wrote %i records to %s" % (count, filename))
I am trying to split a fasta file using Biopython. I want to make it like 7 files in this example. But I am getting an error reading AttributeError: 'function' object has no attribute 'id'
.
Can someone help me? Thank you in advance
The AttributeError is thrown in this line
count = SeqIO.write(batch, handle, "fasta")
because SeqIO.write
expects an iterable or list of type SeqRecord
. However, your batch_iterator
produces a list of methods instead.
Why methods? Well, you are missing a function call here:
entry = iterator.__next__
should be
entry = iterator.__next__()
This makes the code run through without error.
For a test file consisting of 11 sequences, I got the following result - after changing the batch size from 1000 to 4 for testing purposes:
Wrote 4 records to group_1.fasta
Wrote 4 records to group_2.fasta
Wrote 3 records to group_3.fasta