I am trying to use Bio and SeqIO to open a FASTA file that contains multiple sequences, edit the names of the sequences to remove a '.seq' on the end of all the names, (>SeqID20.seq should become >SeqID20), then write all the sequences to a new FASTA file, But i get the following error
AttributeError: 'str' object has no attribute 'id'
This is what I started with :
with open ('lots_of_fasta_in_file.fasta') as f:
for seq_record in SeqIO.parse(f, 'fasta'):
name, sequence = seq_record.id, str(seq_record.seq)
pair = [name.replace('.seq',''), sequence]
SeqIO.write(pair, "new.fasta", "fasta")
but i have also tried this and get the same error:
file_in ='lots_of_fasta_in_file.fasta'
file_out='new.fasta'
with open(file_out, 'w') as f_out:
with open(file_in, 'r') as f_in:
for seq_record in SeqIO.parse(f_in, 'fasta'):
name, sequence = seq_record.id, str(seq_record.seq)
# remove .seq from ID and add features
pair = [name.replace('.seq',''), sequence]
SeqIO.write(pair, file_out, 'fasta')
I assume I'm making some error in going from my list 'pair' to writing to a new file, but I'm not sure what to change. Any help would be appreciated!
Your error occurs because SeqIO.write
accepts a SeqRecord
or a list/iterator of SeqRecord
s but you are feeding it just a list like [name, sequence]
. Instead I suggest you just modify the SeqRecord
.id
and .description
(note, if there is whitepace in the header line, you'll need to handle this too). Also it is most efficient (across Biopython versions) to write all the records at once, rather than calling .write
each iteration:
from Bio import SeqIO
def yield_records():
with open('lots_of_fasta_in_file.fasta') as f:
for seq_record in SeqIO.parse(f, 'fasta'):
seq_record.id = seq_record.description = seq_record.id.replace('.seq','')
yield seq_record
SeqIO.write(yield_records(), 'new.fasta', 'fasta')