Search code examples
rxmlsleeprcurl

delay scrape for few minutes in R for loop


I am trying to scrape a website and it doesn't allow me to scrape more than 9 pages, is there any way I can stop the loop after 9 pages and break for a minute or two and then restart scrape?

Here is the code:

 library(RCurl)
 library(stringr)
 library(XML)

    jt<- c()
for (i in 1:70){
   tryCatch({
    html<- getURL((url[[i]]), followlocation = TRUE)
    doc = htmlParse(html, asText=TRUE)
    new <- xpathSApply(doc, "div/a", 
      xmlValue)
    jt[[i]] <- new},error=function(e){cat("ERROR :",conditionMessage(e), "\n")})}

Solution

  • If you add if(i %% 9 == 0) {Sys.sleep(60)} it will pause for 60 seconds every 9 iterations. The %% operator returns the remainder from dividing i by 9, so if that is equal to 0 you've completed 9 iterations.