Search code examples
pythonbiopythongenbank

Entrez and SeqIO "no records found in handle"


My code looks like this:

import re
from Bio import SeqIO
from Bio import Entrez

Entrez.email = "...@..." # My e-mail address

handle1 = Entrez.efetch(db="pubmed", id=pmid_list_2010, rettype="gb", retmode="text")
data1 = handle1.read()
handle1.close()
handle2 = Entrez.efetch(db="pubmed", id=pmid_list_2011, rettype="gb", retmode="text")
data2 = handle2.read()
handle2.close()
handle3 = Entrez.efetch(db="pubmed", id=pmid_list_2012, rettype="gb", retmode="text")
data3 = handle3.read()
handle3.close()
handle4 = Entrez.efetch(db="pubmed", id=pmid_list_2013, rettype="gb", retmode="text")
data4 = handle4.read()
handle4.close()
handle5 = Entrez.efetch(db="pubmed", id=pmid_list_2014, rettype="gb", retmode="text")
data5 = handle5.read()
handle5.close()
handle6 = Entrez.efetch(db="pubmed", id=pmid_list_2015, rettype="gb", retmode="text")
data6 = handle6.read()
handle6.close()

out_handle = open("test2.gb", "w")
out_handle.write(data1)
out_handle.write(data2)
out_handle.write(data3)
out_handle.write(data4)
out_handle.write(data5)
out_handle.write(data6)
out_handle.close()

in_handle = open("test2.gb", "r")
record = SeqIO.read(in_handle,"genbank")
in_handle.close()

The second to last line is giving me this error:

ValueError: No records found in handle

My file looks fine - it's not empty or anything. There are plenty of records and, as far as I can tell, it's in the correct format. What exactly am I doing wrong?

I have noticed that this works with other databases - "nuceleotide" for example. Is it an issue with Pubmed? Does that require a different format? Thanks.


Solution

  • You are trying to parse the wrong format. When you query the "pubmed" database, you only receive rettypes medline, uilist or abstract. Yet you ask for the Genbank rettype, which makes no sense in this context.

    Instead you could use the Medline parser:

    from Bio import Medline
    
    h1 = Entrez.efetch(db="pubmed",
                       id=["26837606"],
                       rettype="medline",
                       retmode="text")
    
    for record in Medline.parse(h1):
        print(record["TI"])
    

    Outputs

    Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia.