I am looping through a .csv filled with urls to scrape a website (authorizing scraping).
I was using a trycatch
function to try to avoid breaks in my for
loop.
But I noticed it stops for some urls (using download.file
).
So I am now using a « is this a valid url? » function taken from this post: [Scrape with a loop and avoid 404 error
url_works <- function(url){
tryCatch(
identical(status_code(HEAD(url)),200L),
error = function(e){
FALSE
})
}
But even with this function, and looping only if outcome of the function is TRUE
, at some point my loop breaks on some urls and I get the following error:
> HTTP status was '500 Internal Server Error'
I would like to understand this error so that I add this case in the URL function to ignore in case of this url type comes out again.
Any thoughts ? Thanks !
Your tryCatch
syntax is wrong, I also changed the error message to print the error:
A generic tryCatch
looks like:
tryCatch({
operation-you-want-to-try
}, error = function(e) do-this-on-error
)
So for your code:
url_works <- function(url){
tryCatch({
s1 <- status_code(HEAD(url))
}, error = function(e) print(paste0(url, " ", as.character(e)))
)
identical(s1, 200L)
}