Search code examples
web-scrapingrvesthttrjsonlite

Rvest wont return data


I have been trying to scrape the following table:


Solution

  • Your problem was that your request delivered you a html site, not a json response. Thus, parsing it as a json failed with the error you saw.
    (I can't tell you exactly whether it was because you missed out on the accept_json() or whether the URL you used was a bit off.).

    Either way, reverse engineering the essentials of the API request behind the table you linked, you'd have to put something like this together:

    require(httr)
    require(dplyr)
    library(purrr)
    
    first_req <- GET("https://www.barchart.com")
    xsrf_token <- cookies(first_req) %>% filter(name == 'XSRF-TOKEN') %>% pull(value) %>% URLdecode()
    
    req <- GET(
        "https://www.barchart.com/proxies/core-api/v1/quotes/get",
        query = list(
          lists = "stocks.optionable.by_sector.all.us",
          fields = "symbol,symbolName,lastPrice,priceChange,percentChange,highPrice,lowPrice,volume,tradeTime,symbolCode,symbolType,hasOptions",
          orderBy = "symbol",
          orderDir = "asc",
          meta = "field.shortName,field.type,field.description",
          hasOptions = TRUE,
          #page = 1,
          #limit = 100,
          raw = 1
        ),
        content_type_json(),
        accept_json(),
        add_headers(
          "x-xsrf-token" = xsrf_token,
          "referrer" = "https://www.barchart.com/options/stocks-by-sector?page=1"
        )
      )
    
    table_data <- req %>%
      content() %>%
      .$data %>%
      map_dfr(unlist)
    

    This will get you the full list of 4258 items and coerce it into a tibble for convenience :)