I have a list of PDF URLs, and I want to download these PDFs. However, not all of the URLs are still existing, this is why I check them before by means of the RCurl function url.exists()
. With some URLs, however, this function is running forever without delivering a result. I can't even stop it with a withTimeout()
function.
I wrapped url.exists()
into withTimeout()
, but the timeout does not work:
library(RCurl)
library(R.utils)
url <- "http://www.shangri-la.com/uploadedFiles/corporate/about_us/csr_2011/Shangri-La%20Asia%202010%20Sustainability%20Report.pdf"
withTimeout(url.exists(url), timeout = 15, onTimeout = "warning")
The function runs forever, timeout is ignored.
Thus my questions:
Other checks I tried (but which do not sort out this URL) are:
try(length(getBinaryURL(url))>0) == T
http_status(GET(url))
!class(try(GET(url]))) == "try-error"
library(httr)
urls <- c(
'https://www.deakin.edu.au/current-students/unitguides/UnitGuide.php?year=2015&semester=TRI-1&unit=SLE010',
'https://www.deakin.edu.au/current-students/unitguides/UnitGuide.php?year=2015&semester=TRI-2&unit=HMM202',
'https://www.deakin.edu.au/current-students/unitguides/UnitGuide.php?year=2015&semester=TRI-2&unit=SLE339'
)
sapply(urls, url_success, config(followlocation = 0L), USE.NAMES = FALSE)
This functions is analogous to file.exists and determines whether a request for a specific URL responds without error. We make the request but ask the server not to return the body. We just process the header.