Search code examples
rhttpbioinformaticshttp-status-code-500

Handle "500" server error type during API request using rentrez


I am trying to recover some IDs linked to names using the rentrez package that is a R wrapper over the entrez API using this code (short list of query as an example):

vect_names <- c("Theileria sergenti","Dipodascus ambrosiae","Dipodascus armillariae","Dipodascus macrosporus")


idseq <- lapply(vect_names, function(x){
  query <- entrez_search(db = "taxonomy", term = x)
  return(query$ids)
})

Now, this code works for me as long as I get no server errors (type : 500) which stops my requests. For small amounts of queries it is not a problem but I have around 40k queries to send so it will encounter the error for sure. This is the error :

Erreur : HTTP failure: 500
{"error":"error forwarding request","api-key":"xxx.xx.xx.xxx","type":"ip",
"status":"ok"

I did some research and I think I need to wrap this code into a try/except function. However, the documentation is pretty scary to me and I don't see how I can replicate the server error I have so I could build a reproducible example with the error. Also because my full request will last several hours, testing multiple versions of a try/except until I am sure my code handles the error could take a long time.

So what I am looking for here is a version of this first piece of code that will continue to request the same vector element until it gets the result for it (until the HTTP failure is solved, which should take a matter of seconds).

Thanks!


Solution

  • After some research I needed to use the tryCatch function coupled with Sys.sleep :

    idseq <- lapply(vect_names, function(x){
      
      tryCatch(
        {
          query <- entrez_search(db = "taxonomy", term = x)
          return(ifelse(is.na(query), NA, query$ids))
        },
        
        error = function(e)
        {
          Sys.sleep(60) # If error (most probably type 500 serveur), sleep 60scd then redo
          query <- entrez_search(db = "taxonomy", term = x)
          return(ifelse(is.na(query), NA, query$ids))
        }
      )
    })