I'm coding in R and building a web scraping script to programatically search on Google for product images and download them into a folder. I've got a for-loop where there is a step inside to get the image URLs from the Google Image result page
#Define the desired Google image search page
page <- read_html("https://www.google.com/search?q=Djeco%20DD04490%20image&tbm=isch&tbs=isz:lt,islt:0.5")
#Fetch the image urls programatically
image_urls <- page %>% html_nodes(".rg_i") %>% html_attr("data-src")
#Continue the rest flow and download the image jpg files from the image url list
...
However, the image_urls
is always empty and can't proceed further.
How may I resolve this and fetch the image urls from the example page?
You can find all the links in the href
attribute of the a
tags within td
tags. You can then use string parsing to get the urls:
library(rvest)
library(tidyverse)
image_urls <- "https://www.google.com/search?" %>%
paste0("q=Djeco%20DD04490%20image&tbm=isch&tbs=isz:lt,islt:0.5") %>%
read_html() %>%
html_nodes(xpath = "//td/a") %>%
html_attr("href") %>%
`[`(str_detect(., "/url\\?")) %>%
strsplit("=|\\&") %>%
sapply(`[`, 2)
Resulting in:
image_urls
#> [1] "https://smallkins.com/products/djeco-multi-coloured-tent-dd04490"
#> [2] "https://smallkins.com/products/djeco-multi-coloured-tent-dd04490"
#> [3] "https://www.amazon.com.be/-/en/DD04490-DJECO-Cabin-Tent-Multicoloured/dp/B01DKANWME"
#> [4] "https://www.amazon.com.be/-/en/DD04490-DJECO-Cabin-Tent-Multicoloured/dp/B01DKANWME"
#> [5] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"
#> [6] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"
#> [7] "https://smallkins.com/products/djeco-multi-coloured-tent-dd04490"
#> [8] "https://smallkins.com/products/djeco-multi-coloured-tent-dd04490"
#> [9] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"
#> [10] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"
#> [11] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"
#> [12] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou"
#> [13] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"
#> [14] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"
#> [15] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"
#> [16] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"
#> [17] "https://angeloawards.com/item/G1974446"
#> [18] "https://angeloawards.com/item/G1974446"
#> [19] "https://www.mumzworld.com/ar/djeco-indoor-play-tent"
#> [20] "https://www.mumzworld.com/ar/djeco-indoor-play-tent"
#> [21] "https://www.amazon.co.jp/-/en/Oriental-DD04493-Indoor-Scandinavia-Stylish/dp/B085QKFB8L"
#> [22] "https://www.amazon.co.jp/-/en/Oriental-DD04493-Indoor-Scandinavia-Stylish/dp/B085QKFB8L"
#> [23] "https://undha.ac.id/ydzsqccfsw/vm-1975865.html"
#> [24] "https://undha.ac.id/ydzsqccfsw/vm-1975865.html"
#> [25] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"
#> [26] "https://www.amazon.co.jp/-/en/DD04490-Educational-Playhouse-Scandinavian-Christmas/dp/B01DKANWME"
#> [27] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"
#> [28] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"
#> [29] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou-and-toy-box"
#> [30] "https://www.crafts4kids.co.uk/djeco-indoor-play-tent-cabane-tinou-and-toy-box"
#> [31] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"
#> [32] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"
#> [33] "https://ssciindia.com/d/X1409951.html"
#> [34] "https://ssciindia.com/d/X1409951.html"
#> [35] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"
#> [36] "https://www.tickety-boo.co.uk/acatalog/Multicoloured-Tent-by-Djeco-5802.html"
#> [37] "https://www.doudous-et-peluches.com/achat/djeco-cabane-multicolore_332216"
#> [38] "https://www.doudous-et-peluches.com/achat/djeco-cabane-multicolore_332216"
#> [39] "https://toybox.lt/zaidimu-nameliai-palapines/1036177-djeco-spalvota-palapine-dd04490-3070900044906.html"
#> [40] "https://toybox.lt/zaidimu-nameliai-palapines/1036177-djeco-spalvota-palapine-dd04490-3070900044906.html"
Created on 2023-07-30 with reprex v2.0.2