Search code examples
rfor-looperror-handlingrvest

How can you continue the for loop in R even after an error?


I am parsing data from multiple links. But some of those links got broken after a while. And when I parse using rvest package it shows an error or warning. What can I do to continue parsing with for-loop, so it moves to the next line.


house_link <- "https://somon.tj/adv/7985721_2-komn-dom-grandzavod/"
house_features = data.frame()

for(x in 1:length(house_link)) {
  
   tryCatch({
      page_data = read_html(house_link[x])
      message("Executed.")
  }, error = function(e){
      message('Caught an error!')
      print(e)
  }, warning = function(w){
      message('Caught an warning!')
      print(w)
  }, finally = {
      message('All done, quitting.')
  }
)    
    pricing = page_data %>% html_nodes(".css-13sm4s4") %>% 
      html_element("span") %>% html_text() 
    house_features = rbind(house_features, data.frame(pricing, stringsAsFactors = FALSE))
}

Solution

  • Maybe something like this?

    library(rvest)
    
    house_link <- "https://lalafo.kg/bishkek/ads/104-seria-2-komnaty-47-kv-m-s-mebelu-kondicioner-zivotnye-ne-prozivali-id-95221626"
    house_features = data.frame()
    
    for(x in 1:3) { # seq_along(house_link)  <- if you have more than 1 link this is the correct method
      
      cat('Link', x)
      
      start_time <- Sys.time()
      if (x %% 200 == 0) {
        Sys.sleep(5)
        print("pausing ...")}
      
      page_data <- tryCatch({
        page_data = read_html(house_link[x])
        message("Executed.")
      }, error = function(e){
        message('\nCaught an error!')
        return(NA) # here a return variable for testing is returned in the error condition - notice that this has to be initiated with the return function
      }, finally = {cat('Continuing with', x+1,'\n')})   #; next()})  <-  disabled next()
      
      ## This part is handled by finally next()
      ############################
      if(is.na(page_data)){      #
        cat('this is a test\n')  #
        next()                   #
        }                        #
      ############################
      
      else{  # else is not strictly necessary but the point may be easier to contextualised like this
        pricing = page_data %>% html_nodes(".css-13sm4s4") %>% 
          html_element("span") %>% html_text() 
        house_features = rbind(house_features, data.frame(pricing, stringsAsFactors = FALSE))
      }
    }
    
    Link 1
    Caught an error!
    Continuing with 2 
    this is a test
    
    Link 2
    Caught an error!
    Continuing with 3 
    this is a test
    
    Link 3
    Caught an error!
    Continuing with 4 
    this is a test