I need to access the same web page with different "keys" to get specific content it provides.
I have a list of keys x
and I use the GET
command from httr
package to access the web page and then retrieve the information I need y
.
library(httr)
library(stringr)
library(XML)
for (i in 1:20){
h1 = GET ( paste0("http:....categories=&query=", x[i]),timeout(10))
par = htmlParse(file = h1)
y[i]=xpathSApply(doc = par, path = "//h3/a" , fun=xmlValue)
}
The problem is that timeout is often reached, and it disrupts the loop.
So I would like to refresh the web page or retry the GET command if timeout is reached, because I suspect the problem is with the internet connection of the website I am trying to access.
The way my code works, timeout breaks the loop. I need to either ignore the error and go to next iteration or retry to access the website.
Look at purrr::safely()
. You can wrap GET
as such:
safe_GET <- purrr::safely(GET)
This removes the ugliness of tryCatch()
by letting you do:
resp <- safe_GET("http://example.com") # you can use all legal `GET` params
And you can test resp$result
for NULL
. Put that into your retry loop and you're good to go.
You can see this in action by doing:
str(safe_GET("https://httpbin.org/delay/3", timeout(1)))
which will ask the httpbin service to wait 3s before responding but set an explicit timeout on the GET
request to 1s. I wrapped it in str()
to show the result:
List of 2
$ result: NULL
$ error :List of 2
..$ message: chr "Timeout was reached"
..$ call : language curl::curl_fetch_memory(url, handle = handle)
..- attr(*, "class")= chr [1:3] "simpleError" "error" "condition"
So, you can even check the message if you need to.