Search code examples
rerror-handlingtry-catchrvest

How to skip an error and in a for loop in R


I want to web scrap the URLs of pictures in a list of web pages. I tried the following code.

library(rvest)

pic_flat = data.frame()

for (i in 7:60){
  # creating a loop for page urls
  link <- paste0("https://www.immobilienscout24.at/regional/wien/wien/wohnung-kaufen/seite-", i)
  page <- read_html(link)
  # scraping href and creating a url
  href <- page %>% html_elements("a.YXjuW") %>% html_attr('href')
  apt_link <- paste0("https://www.immobilienscout24.at",href)
pic_flat = rbind(pic_flat, data.frame(apt_link))
  }

#get the link to the apartment picture
 apt_pic <- data.frame()
 apt <- pic_flat$apt_link

 for(x in apt){   
   picture <- read_html(x) %>% html_element(".CmhTt") %>% html_attr("src")
   apt_pic <- rbind(apt_pic,data.frame(picture))
   }
df_pic <- cbind(pic_flat,data.frame(apt_pic))

But some web pages crash in the middle of the iteration. For example:

Error in open.connection(x, "rb") : HTTP error 502.

So I want to skip those web pages and continue with the next web page and scrap available picture URLs to my data frame. How to use tryCatch function or any other method, to accomplish this task?


Solution

  • We can create a function and then use tryCatch or possibly to skip the errors.

    First create function f1 to get links to pictures,

    #function f1
    f1 = function(x){
      picture <- x %>% read_html() %>% html_element(".CmhTt") %>% html_attr("src")
    }
    
    apt <- pic_flat$apt_link
    
    #now loop by skipping errors
    apt_pic = lapply(apt, possibly(f1, NA))