Search code examples
rweb-scrapingrvestbing

Scraping URLs from Bing image search results


I'm building a scraping script in R to fetch product images from search engines. At the current stage I've managed to successfully fetch the URLs that contain images from Google Image Search with the below code snippet:

google_urls <- GET("https://www.google.com/search?q=WWF%20CUB%20CLUB%20WWF16215003&tbm=isch", user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36") %>%
                 read_html() %>%
                 html_nodes(xpath = "//td/a") %>% 
                 html_attr("href") %>%
                 `[`(str_detect(., "/url\\?")) %>%
                 strsplit("=|\\&") %>%
                 sapply(`[`, 2)

In order to extend the search of the scraping solution, I wish to also scrape similar URLs from Bing. However when I replicate the code below for Bing I'm not getting any results back. In fact, bing_urls is empty when running the code block.

bing_urls <- GET("https://www.bing.com/images/search?q=WWF%20CUB%20CLUB%20WWF16215003", user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36") %>%
                read_html() %>%
                html_nodes(xpath = "//td/a") %>% 
                html_attr("href") %>%
                `[`(str_detect(., "/url\\?")) %>%
                strsplit("=|\\&") %>%
                sapply(`[`, 2)

How may I amend the bing code block to get similar type of URLs like the initial one for Google?


Solution

  • The code below returns similar results to the example code you gave for Google Search:

    library(httr)
    
    GET("https://www.bing.com/images/search?q=WWF%20CUB%20CLUB%20WWF16215003", user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36") %>%
                    read_html() %>%
                    html_nodes(".lnkw") %>%
                    html_nodes("a") %>%
                    html_attr("href")
    

    It would also be remiss of me not to mention that Bing has an API, and scraping appears to be against their terms of use, so maybe don't pound their servers, otherwise you're likely to get blocked.