Search code examples
rweb-scrapingdplyrrvest

for loop function with two variables from different columns - not nested


from a vector list with to columns, a loop should use an url from the first column and subsequently store a thumbnail under the name from the value in the second column

A solution for single value and name works; but looping through a list not

Goal is to have stored images as:

name1.jpg

name2.jpg

name3.jpg

name4.jpg

 list <- data.frame(
    urls=c("url1", "url2", "url3", "url4"),
    names=c("name1","name2","name3","name4")
  )

# This works for a single Url and single name  
  page <- read_html(z)
  thumbnail_url <- page %>% html_node("meta[property='og:image']") %>% html_attr("content")
  thumbnail_id <- ("test.jpg")
  download.file(thumbnail_url,thumbnail_id, mode = 'wb')

  

#Tried with a nested loop function with no success

  for (i in list$urls){
    for(j in list$names){
      
      page <- read_html(i)
      thumbnail_url <- page %>% html_node("meta[property='og:image']") %>% html_attr("content")
      thumbnail_id <- (paste(j,".jpg"))
      download.file(thumbnail_url,thumbnail_id, mode = 'wb')
      
    }
  }

#using nrow and ncol didnt help me to get the filename right

for (row in 1:nrow(list$urls)) {
  for (col in 1:ncol(list$names)) {
   
    page <- read_html(row)
    thumbnail_url <- page %>% html_node("meta[property='og:image']") %>% html_attr("content")
    thumbnail_id <- (paste(col,".jpg"))
    download.file(thumbnail_url,thumbnail_id, mode = 'wb')
    
  }
}

Solution

  • You just need to use your loop counter as an index to the url and names elements of the list:

    for (i in 1:nrow(list)) {
        page <- read_html(list$urls[i])
        thumbnail_url <- page %>% html_node("meta[property='og:image']") %>% html_attr("content")
        thumbnail_id <- (paste(list$names[i],".jpg"))
        download.file(thumbnail_url,thumbnail_id, mode = 'wb')
    }