How to refresh or retry a specific web page using httr GET command?

I need to access the same web page with different "keys" to get specific content it provides.

I have a list of keys x and I use the GET command from httr package to access the web page and then retrieve the information I need y.

library(httr)
library(stringr)
library(XML)

for (i in 1:20){
    h1 = GET ( paste0("http:....categories=&query=", x[i]),timeout(10))
    par = htmlParse(file = h1)

    y[i]=xpathSApply(doc = par, path = "//h3/a" , fun=xmlValue)

}

The problem is that timeout is often reached, and it disrupts the loop.

So I would like to refresh the web page or retry the GET command if timeout is reached, because I suspect the problem is with the internet connection of the website I am trying to access.

The way my code works, timeout breaks the loop. I need to either ignore the error and go to next iteration or retry to access the website.

Solution

Look at purrr::safely(). You can wrap GET as such:

safe_GET <- purrr::safely(GET)

This removes the ugliness of tryCatch() by letting you do:

resp <- safe_GET("http://example.com") # you can use all legal `GET` params

And you can test resp$result for NULL. Put that into your retry loop and you're good to go.

You can see this in action by doing:

str(safe_GET("https://httpbin.org/delay/3", timeout(1)))

which will ask the httpbin service to wait 3s before responding but set an explicit timeout on the GET request to 1s. I wrapped it in str() to show the result:

List of 2
 $ result: NULL
 $ error :List of 2
  ..$ message: chr "Timeout was reached"
  ..$ call   : language curl::curl_fetch_memory(url, handle = handle)
  ..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"

So, you can even check the message if you need to.