I want to scrape the number of product sold using Rvest from a marketplace webpage.
I used this code, but it returned no value.
library(rvest)
doc <- read_html("https://www.tokopedia.com/berasprimasari/beras-bunga-25kg")
sold <- html_nodes(doc, ".rvm-product-info--item_value.mt-5.item-sold-count") %>%
html_text()
sold
------------
RESULT:
[1] " "
EXPECTED:
[1] " 378 "
How can I adjust my code to extract that number?
Many thanks in advance!
It is retrieved dynamically from a product stats endpoint you can find in network tab. You could string split or simply regex out part giving number sold. You need to pass the product id which you can grab from a request to the original url.
library(stringr)
library(magrittr)
library(httr)
get_product_id <- function(url){
headers = c('User-Agent' = 'Mozilla/5.0')
s <- read_html(httr::GET(url, httr::add_headers(.headers=headers)))%>%html_text()
id <- str_match_all(s,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
return(id)
}
url = 'https://www.tokopedia.com/berasprimasari/beras-bunga-25kg'
p <- read_html(paste0('https://js.tokopedia.com/productstats/check?pid=',get_product_id(url),'&callback=show_product_stats&_='))%>%
html_text()
number_sold <- str_match_all(p,'item_sold\":(\\d+)')[[1]][,2]