Search code examples
pythonsequencebioinformaticsfasta

Extract sequence using sequence ID in fasta file


I have a fasta ID (Q99424 in the example) and I need to extract the corresponding sequence for that ID. I am using Bio library for that which represent each record as below:

SeqRecord(seq=Seq('MGSPVHRVSLGDTWSRQMHPDIESERYMQSFDVERLTNILDGGAQNTALRRKVE...SKL'), id='sp|Q99424|ACOX2_HUMAN', name='sp|Q99424|ACOX2_HUMAN', description='sp|Q99424|ACOX2_HUMAN', dbxrefs=[])

I can iterate over every record and search but it is not the best way because I have more than 12000 records to search.

This is how we can iterate over all sequences:

for record in SeqIO.parse(handle, "fasta"):
  name = record.name
  id = record.name.split("|")[1]

Solution

  • You can use pyfaidx : https://pythonhosted.org/pyfaidx/ You can generate the fasta index ".fai" using this module or samtools, and then use pyfaidx's fetch function