Search code examples
web-scrapingrvest

using Rvest to get table


I am trying to scrape the table in : WEB TABLE

I have tried copying the xpath but it does not return anything:

 require("rvest")
 url = "https://www.barchart.com/options/stocks-by-sector?page=1"
 pg = read_html(url)

 pg %>% html_nodes(xpath="//*[@id=main-content-column]/div/div[4]/div/div[2]/div")

EDIT

I found the following link and feel I am getting closer....

So by using the same process I found the updated link by watching the XHR updates:

 url = paste0("https://www.barchart.com?access_token=",token,"/proxies/core-api/v1/quotes/",
         "get?lists=stocks.optionable.by_sector.all.us&fields=symbol%2CsymbolName",
         "%2ClastPrice%2CpriceChange%2CpercentChange%2ChighPrice%2ClowPrice%2Cvolume",
         "%2CtradeTime%2CsymbolCode%2CsymbolType%2ChasOptions&orderBy=symbol&orderDir=",
         "asc&meta=field.shortName%2Cfield.type%2Cfield.description&hasOptions=true&page=1&limit=100&raw=1")

Where the token is found within the scope:

 token = "eyJpdiI6IjJZMDZNOGYwUDk4dE1OcVc4ekdnUGc9PSIsInZhbHVlIjoib2lYcWtzRi9VN3ovbzdER2NhQlg0KzJQL1ZId2ZOeWpwSTF5YThlclN1SW9YSEtJbG9kR0FLbmRmWmtNcmd1eCIsIm1hYyI6ImU4ODA3YzZkZGUwZjFhNmM1NTE4ZjEzNmZkNThmZDY4ODE1NmM0YTM1Yjc2Y2E2OWVkNjZiZTE3ZDcxOGFlZjMifQ"

However, I do not know if I am placing the token where I should in the URL, but when I ran:

 fixture <- jsonlite::read_json(url,simplifyVector = TRUE)

I received the following error:

 Error in parse_con(txt, bigint_as_char) : 
 lexical error: invalid char in json text.
                                   <!doctype html> <html itemscope
                 (right here) ------^

Solution

  • The token needs to be sent as a request header named x-xsrf-token not by pass to the parameters: enter image description here Also, the token value might change over sessions so you need to get it in the cookie. After that, convert the data to a data frame and get the result:

    library(rvest)
    pg <- html_session("https://www.barchart.com/options/stocks-by-sector?page=1")
    cookies <- pg$response$cookies
    token <- URLdecode(dplyr::recode("XSRF-TOKEN", !!!setNames(cookies$value, cookies$name)))
    pg <- 
      pg %>% rvest:::request_GET(
        "https://www.barchart.com/proxies/core-api/v1/quotes/get?lists=stocks.optionable.by_sector.all.us&fields=symbol%2CsymbolName%2ClastPrice%2CpriceChange%2CpercentChange%2ChighPrice%2ClowPrice%2Cvolume%2CtradeTime%2CsymbolCode%2CsymbolType%2ChasOptions&orderBy=symbol&orderDir=asc&meta=field.shortName%2Cfield.type%2Cfield.description&hasOptions=true&page=1&limit=1000000&raw=1",
        config = httr::add_headers(`x-xsrf-token` = token)
      )
    data_raw <- httr::content(pg$response)
    data <- 
      purrr::map_dfr(
        data_raw$data,
        function(x){
          as.data.frame(x$raw)
        }
      )