Search code examples
pythonpython-2.7biopythonncbi

How do I navigate results of a Biopython Entrez efetch?


When I run the following;

from Bio.Blast import NCBIWWW
from Bio import Entrez, SeqIO
Entrez.email = "[email protected]"
handle = Entrez.efetch(db="Protein", id= "75192198", rettype = "xml")
record = Entrez.read(handle)

I get back a "Bio.Entrez.Parser.DictionaryElement" that is really difficult to search through. If I want to say get the the get the amino acid sequence I have to type something like this;

record["Bioseq-set_seq-set"][0]["Seq-entry_seq"]["Bioseq"]["Bioseq_inst"]["Seq-inst"]["Seq-inst_seq-data"]["Seq-data"]["Seq-data_iupacaa"]["IUPACaa"]

I know that there has to be an easier way to index the elements in these results. If anyone out there can lend me a hand with this I'd appreciate it very much.


Solution

  • If what you want is the sequence, then instead of querying it in "xml" format, query it in (for example) FASTA format, by changing the rettype argument. Then it's as simple as parsing it using SeqIO.

    handle = Entrez.efetch(db="Protein", id= "75192198", rettype = "fasta")
    
    for r in SeqIO.parse(handle, "fasta"):
        print r.id, r.seq
    

    This works because the contents of handle look like:

    print handle.read()
    # >gi|75192198|sp|Q9MAH8.1|TCP3_ARATH RecName: Full=Transcription factor TCP3
    # MAPDNDHFLDSPSPPLLEMRHHQSATENGGGCGEIVEVQGGHIVRSTGRKDRHSKVCTAKGPRDRRVRLS
    # APTAIQFYDVQDRLGFDRPSKAVDWLITKAKSAIDDLAQLPPWNPADTLRQHAAAAANAKPRKTKTLISP
    # PPPQPEETEHHRIGEEEDNESSFLPASMDSDSIADTIKSFFPVASTQQSYHHQPPSRGNTQNQDLLRLSL
    # QSFQNGPPFPNQTEPALFSGQSNNQLAFDSSTASWEQSHQSPEFGKIQRLVSWNNVGAAESAGSTGGFVF
    # ASPSSLHPVYSQSQLLSQRGPLQSINTPMIRAWFDPHHHHHHHQQSMTTDDLHHHHPYHIPPGIHQSAIP
    # GIAFASSGEFSGFRIPARFQGEQEEHGGDNKPSSASSDSRH
    

    If you still want some of the other meta information (such as transcription factor binding sites within the gene, or the taxonomy of the organism), you can also download it in genbank format by giving the argument rettype="gb" and parsing with "gb". You can learn more about that in the example here.