Search code examples
pythonbioinformaticsbiopythonfastq

Error in changing fastq header and written back with BioPython


I am trying to change fastq header with postfix /1 and /2 and written back as new fie. However, I got this error:

No suitable quality scores found in letter_annotations of SeqRecord 

Is there any way to solve this problem? Do I need to modify the quality score information to match changed fastq header?

import sys
from Bio.Seq import Seq
from Bio import SeqIO
from Bio.SeqRecord import SeqRecord

file = sys.argv[1]
final_records=[]
for seq_record in SeqIO.parse(file, "fastq"):
    print seq_record.format("fastq")
    #read header
    header =seq_record.id
    #add /1 at the end
    header ="{0}/1".format(header)
    # print(repr(seq_record.seq))
    record = SeqRecord(seq_record.seq,id=header,description=seq_record.description)
    final_records.append(record)
SeqIO.write(final_records, "my_example.fastq", "fastq")

Solution

  • You're getting the error because your new sequences don't have quality scores. You could transfer the quality scores from the input sequences:

    record.letter_annotations["phred_quality"]=seq_record.letter_annotations["phred_quality"]
    

    It's probably easier to just modify the ids of the original sequences and write them to the output file tho:

    seq_record.id = header
    final_records.append(seq_record)