Search code examples
pythonsequencebiopythonfasta

Biopython SeqIO: AttributeError: 'str' object has no attribute 'id'


I am trying to filter out sequences using SeqIO but I am getting this error.

Traceback (most recent call last):
  File "paralog_warning_filter.py", line 61, in <module>
.
.
.
    SeqIO.write(desired_proteins, "filtered.fasta","fasta")
AttributeError: 'str' object has no attribute 'id'

I checked other similar questions but still couldn't understand what is wrong with my script.

Here is the relevant part of the script I am trying:

fh=open('lineageV_paralog_warning_genes.fasta')
for s_record in SeqIO.parse(fh,'fasta'):
    name = s_record.id
    seq = s_record.seq
    for i in paralogs_in_all:
        if name.endswith(i):
            desired_proteins=seq
            output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta")
output_file
fh.close()

I have a separate paralagos_in_all list and that is the ID source. When I try to print name it returns a proper string id names which are in this format >coronopifolia_tair_real-AT2G35040.1@10.

Can you help me understand my problem? Thanks in advance.


Solution

  • try and let us know (can't test your code ) :

    
    from Bio.SeqRecord import SeqRecord
    from Bio import SeqIO
    ......
    .......
    
    desired_proteins = []
    
    fh=open('lineageV_paralog_warning_genes.fasta')
    for s_record in SeqIO.parse(fh,'fasta'):
        name = s_record.id
        seq = s_record.seq
        for i in paralogs_in_all:
            if name.endswith(i):
                # desired_proteins=SeqRecord( Seq(seq), id=name) ### here seq is already a Seq object see below
                desired_proteins.append(SeqRecord( seq, id=name, description="")) # description='' removes the <unknown description> that otherwise would be present 
                
                
    output_file=SeqIO.write(desired_proteins, "filtered.fasta","fasta") ## don't know how to have SeqIO.write to append to file instead of re-writing all of it
    
    fh.close()