I am trying to filter out sequences using SeqIO but I am getting this error.
Traceback (most recent call last):
File "paralog_warning_filter.py", line 61, in <module>
.
.
.
SeqIO.write(desired_proteins, "filtered.fasta","fasta")
AttributeError: 'str' object has no attribute 'id'
I checked other similar questions but still couldn't understand what is wrong with my script.
Here is the relevant part of the script I am trying:
fh=open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh,'fasta'):
name = s_record.id
seq = s_record.seq
for i in paralogs_in_all:
if name.endswith(i):
desired_proteins=seq
output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta")
output_file
fh.close()
I have a separate paralagos_in_all
list and that is the ID source. When I try to print name
it returns a proper string id names which are in this format >coronopifolia_tair_real-AT2G35040.1@10
.
Can you help me understand my problem? Thanks in advance.
try and let us know (can't test your code ) :
from Bio.SeqRecord import SeqRecord
from Bio import SeqIO
......
.......
desired_proteins = []
fh=open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh,'fasta'):
name = s_record.id
seq = s_record.seq
for i in paralogs_in_all:
if name.endswith(i):
# desired_proteins=SeqRecord( Seq(seq), id=name) ### here seq is already a Seq object see below
desired_proteins.append(SeqRecord( seq, id=name, description="")) # description='' removes the <unknown description> that otherwise would be present
output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta") ## don't know how to have SeqIO.write to append to file instead of re-writing all of it
fh.close()