Search code examples
rrvest

Rvest doesn't scrape from <span>


I'm trying to scrape prices from Amazon. It used to work before but now it doesn't and I don't know if they implemented some protection or if I'm not using rvest correctly.

Here is the source code

I'm trying to scrape with this code:

library(rvest)

my_url <- "https://www.amazon.com/s?k=reusable+straws"
user_agent <- user_agent("Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120")
my_session <- session(my_url, user_agent)

my_session %>%
  html_elements(".a-offscreen")

I can scrape the <a class> above just fine and I can scrape the <span class="a-size-base a-color-secondary"> below fine but none of the price spans.

Any ideas?


Solution

  • Consider using tools like SelectorGadget to better identify the correct HTML elements to scrape.

    library(tidyverse)
    library(rvest)
    
    "https://www.amazon.com/s?k=reusable+straws" %>% 
      read_html() %>% 
      html_elements(".puis-card-border") %>% # Select each product box
      map_dfr(~ tibble( # Map over every box to extract info
        title = html_element(.x, ".a-color-base.a-text-normal") %>% 
          html_text2(), 
        price = html_element(.x, ".a-price") %>% 
          html_text2(), 
        rating = html_element(.x, ".aok-align-bottom") %>% 
          html_text2()
      ))
    
    # A tibble: 60 x 3
       title                                               price rating
       <chr>                                               <chr> <chr> 
     1 "HSHIJYA 18 Pack Reusable Stainless Steel Straws w~ $18.~ 4.7 o~
     2 "Piteno\u00ae 16-Pack Reusable Glass Straws, Clear~ $6.9~ 4.7 o~
     3 "Softy Straws Premium Reusable Stainless Steel Dri~ $12.~ 4.7 o~
     4 "15 FITS ALL TUMBLERS STRAWS - Reusable Silicone S~ $14.~ 4.6 o~
     5 "Tronco Set of 6 Stainless Steel Reusable Metal St~ $9.9~ 4.6 o~
     6 "Hiware 12-Pack Reusable Stainless Steel Metal Str~ $6.2~ 4.8 o~
     7 "24 PCS, Reusable Straws with 4 Brushes, 10.5\" Lo~ $5.9~ 4.6 o~
     8 "Kynup Reusable Straws, 4Pack Collapsible Portable~ $9.9~ 4.6 o~
     9 "Ello Impact Reusable Hard Plastic Straws with Cle~ $3.4~ 4.7 o~
    10 "ALINK 10.5 in Long Rainbow Colored Reusable Trita~ $4.9~ 4.7 o~
    # i 50 more rows
    # i Use `print(n = ...)` to see more rows