Search code examples
pythonbiopythonhttp.client

How to handle IncompleteRead: in biopython


I am trying to fetch fasta sequences for accession numbers from NCBI using Biopython. Usually the sequences were successfully downloaded. But once in a while I get the below error:

http.client.IncompleteRead: IncompleteRead(61808640 bytes read)

I have searched the answers How to handle IncompleteRead: in python

I have tried top answer https://stackoverflow.com/a/14442358/4037275. It is working. However, the problem is, it downloads partial sequences. Is there any other way. Can anyone point me in right direction?

from Bio import Entrez
from Bio import SeqIO
Entrez.email = "my email id"


def extract_fasta_sequence(NC_accession):
    "This takes the NC_accession number and fetches their fasta sequence"
    print("Extracting the fasta sequence for the NC_accession:", NC_accession)
    handle = Entrez.efetch(db="nucleotide", id=NC_accession, rettype="fasta", retmode="text")
    record = handle.read()

Solution

  • You will need to add a try/except to catch common network errors like this. Note that exception httplib.IncompleteRead is a subclass of the more general HTTPException, see: https://docs.python.org/3/library/http.client.html#http.client.IncompleteRead

    e.g. http://lists.open-bio.org/pipermail/biopython/2011-October/013735.html

    See also https://github.com/biopython/biopython/pull/590 would catch some of the other errors you can get with the NCBI Entrez API (errors the NCBI ought to deal with but don't).