I have a fasta ID (Q99424
in the example) and I need to extract the corresponding sequence for that ID. I am using Bio
library for that which represent each record as below:
SeqRecord(seq=Seq('MGSPVHRVSLGDTWSRQMHPDIESERYMQSFDVERLTNILDGGAQNTALRRKVE...SKL'), id='sp|Q99424|ACOX2_HUMAN', name='sp|Q99424|ACOX2_HUMAN', description='sp|Q99424|ACOX2_HUMAN', dbxrefs=[])
I can iterate over every record and search but it is not the best way because I have more than 12000 records to search.
This is how we can iterate over all sequences:
for record in SeqIO.parse(handle, "fasta"):
name = record.name
id = record.name.split("|")[1]
You can use pyfaidx : https://pythonhosted.org/pyfaidx/ You can generate the fasta index ".fai" using this module or samtools, and then use pyfaidx's fetch function