I want to get the id_product and id_parent from this web page. Yesterday, I could get the results, but when I tried it again today I got an error message. Anyway, I'm doing it from rstudio.cloud.
url <- paste("https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack")
headers = c('User-Agent' = 'Mozilla/5.0')
doc <- read_html(httr::GET(url, httr::add_headers(.headers=headers)))%>%
html_text()
id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]
id_product
id_parent
Error in curl::curl_fetch_memory(url, handle = handle) :
Send failure: Connection was reset
I've been trying to search for the possible explanation but is still to no avail.
An extra header is required by server
library(httr)
library(stringr)
library(magrittr)
headers = c(
'User-Agent' = 'Mozilla/5.0',
'Accept' = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
)
doc <- read_html(httr::GET(url = 'https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack', httr::add_headers(.headers=headers)))%>%
html_text()
id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]
id_product
id_parent