I'm having a problem with my code. I'm trying to iterate through the genbank file's list of genes using BioPython. Here's what it looks like:
class genBank:
gbProtId = str()
gbStart = int()
gbStop = int()
gbStrand = int()
genBankEntries = list()
for seq_record in SeqIO.parse(genBankFile, "genbank"):
for seq_feature in seq_record.features:
genBankEntry = genBank
if seq_feature.type == "CDS":
genBankEntry.gbProtId = seq_feature.qualifiers['protein_id']
genBankEntry.gbStart = seq_feature.location.start # prodigal GFF3 output is 1 based indexing
genBankEntry.gbStop = seq_feature.location.end
genBankEntry.gbStrand = seq_feature.strand
genBankEntries.append(genBankEntry)
It looks like it should work, but when I run it, the resulting structure genBankEntries
is just an enormous stack the size of the number of genes in the genbank file but with only the final value in seq_record.features as each list element:
00 = {type} <class '__main__.genBank'>
gbProtId = {list} ['BAA31840.1']
gbStart = {ExactPosition} 90649
gbStop = {ExactPosition} 91648
gbStrand = {int} 1
...
82 = {type} <class '__main__.genBank'>
gbProtId = {list} ['BAA31840.1']
gbStart = {ExactPosition} 90649
gbStop = {ExactPosition} 91648
gbStrand = {int} 1
This is especially confusing because both for-loops seem to work correctly:
for seq_record in SeqIO.parse(genBankFile, "genbank"):
for seq_feature in seq_record.features:
print(seq_feature)
Why is this?
You are never creating any instances of the genBank
class. Each loop iteration is changing class-level attributes of the genBank
class, and you are adding the same object to the list each time. Each pass through the loop overwrites the values in the previous pass.
For the first line in your inner loop, add parenthesis to call the type and create an instance of genBank
. It will instead be genBankEntry = genBank()
. This creates a new distinct object for each loop pass.