Search code examples
rweb-crawlerrepeathttp-status-code-503

Repeat() function in R to retry url when 503


I was crawling a quite unstable website, which sometimes collapse into 503 and could only be fixed when refreshed. So I created these code to ask my crawler to retry the 503 page until the content has been passed to a variable:

repeat{
  info = NA
  info = read_html(url2)
  if(is.na(info) == F) {
    break
    }
}
info

But for some reason this does not work. The system still throw me this, which it should not:

Error in open.connection(x, "rb") : HTTP error 503.
> info
[1] NA

Sometimes it even gives me this, but under such condition the content could be passed to the variable info with no problem:

Warning messages:
1: In for (i in seq_along(cenv$extra)) { :
  closing unused connection 6 (url)
2: In for (i in seq_along(cenv$extra)) { :
  closing unused connection 5 (url)

How can I build a code to retry the 503 pages?


Solution

  • You need to capture the error, this should work:

    counter = 0
    
    repeat {
      counter = counter + 1
      info = tryCatch(
        read_html(url2),
        # if you want to capture warnings as well
        warning = function(w) {
          Sys.sleep(30)
          NA
        },
        error = function(e) {
          Sys.sleep(30)  
          NA
        }
      )
      if(!is.na(info) | counter >= 10) {
        break
      }
    }
    

    This is also the gist of what purrr::insistently does.