I want to web scrap the URLs of pictures in a list of web pages. I tried the following code.
library(rvest)
pic_flat = data.frame()
for (i in 7:60){
# creating a loop for page urls
link <- paste0("https://www.immobilienscout24.at/regional/wien/wien/wohnung-kaufen/seite-", i)
page <- read_html(link)
# scraping href and creating a url
href <- page %>% html_elements("a.YXjuW") %>% html_attr('href')
apt_link <- paste0("https://www.immobilienscout24.at",href)
pic_flat = rbind(pic_flat, data.frame(apt_link))
}
#get the link to the apartment picture
apt_pic <- data.frame()
apt <- pic_flat$apt_link
for(x in apt){
picture <- read_html(x) %>% html_element(".CmhTt") %>% html_attr("src")
apt_pic <- rbind(apt_pic,data.frame(picture))
}
df_pic <- cbind(pic_flat,data.frame(apt_pic))
But some web pages crash in the middle of the iteration. For example:
Error in open.connection(x, "rb") : HTTP error 502.
So I want to skip those web pages and continue with the next web page and scrap available picture URLs to my data frame. How to use tryCatch
function or any other method, to accomplish this task?
We can create a function and then use tryCatch
or possibly
to skip the errors.
First create function f1
to get links to pictures,
#function f1
f1 = function(x){
picture <- x %>% read_html() %>% html_element(".CmhTt") %>% html_attr("src")
}
apt <- pic_flat$apt_link
#now loop by skipping errors
apt_pic = lapply(apt, possibly(f1, NA))