Search code examples
performancebiopythongenbank

Improve genbank feature addition


I am trying to add more than 70000 new features to a genbank file using biopython.

I have this code:

from Bio import SeqIO
from Bio.SeqFeature import SeqFeature, FeatureLocation

fi = "myoriginal.gbk"
fo = "mynewfile.gbk"

for result in results:
     start = 0
     end = 0

     result = result.split("\t")
     start = int(result[0])
     end = int(result[1])

     for record in SeqIO.parse(original, "gb"):
         record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
         SeqIO.write(record, fo, "gb")

Results is just a list of lists containing the start and end of each one of the features I need to add to the original gbk file.

This solution is extremely costly for my computer and I do not know how to improve the performance. Any good idea?


Solution

  • You should parse the genbank file just once. Omitting what results contains (I do not know exactly, because there are some missing pieces of code in your example), I would guess something like this would improve performance, modifying your code:

    fi = "myoriginal.gbk"
    fo = "mynewfile.gbk"
    
    original_records = list(SeqIO.parse(fi, "gb"))
    
    for result in results:
        result = result.split("\t")
        start = int(result[0])
        end = int(result[1])
    
        for record in original_records:
            record.features.append(SeqFeature(FeatureLocation(start, end), type = "misc_feat"))
            SeqIO.write(record, fo, "gb")