Search code examples
rweb-scrapingrcurlhttr

Error in curl::curl_fetch_memory(url, handle = handle) : Send failure: Connection was reset (RStudio.cloud)


I want to get the id_product and id_parent from this web page. Yesterday, I could get the results, but when I tried it again today I got an error message. Anyway, I'm doing it from rstudio.cloud.

url <-  paste("https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack")

    headers = c('User-Agent' = 'Mozilla/5.0')
    doc <- read_html(httr::GET(url, httr::add_headers(.headers=headers)))%>%
          html_text()
    id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
    id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]

    id_product
    id_parent

Error in curl::curl_fetch_memory(url, handle = handle) : 
  Send failure: Connection was reset

I've been trying to search for the possible explanation but is still to no avail.


Solution

  • An extra header is required by server

    library(httr)
    library(stringr)
    library(magrittr)
    
    headers = c(
      'User-Agent' = 'Mozilla/5.0',
      'Accept' = 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'
    )
    
    doc <- read_html(httr::GET(url = 'https://www.tokopedia.com/zhafranseafood/cumi-asin-1kg-per-pack', httr::add_headers(.headers=headers)))%>%
           html_text()
    
    id_product <- str_match_all(doc,'product_id\\s+=\\s+(\\d+);')[[1]][,2]
    id_parent <- str_match_all(doc,'parent_id\\s+=\\s+(\\d+);')[[1]][,2]
    
    id_product
    id_parent