Search code examples
rrvestyahoo-finance

Yahoo Finance - Web scraping with R - SelectorGadget doesn't work


I am trying to get the 1980 matching stocks from this Yahoo Finance Screener:

[https://finance.yahoo.com/screener/unsaved/38a77251-0996-439b-8be4-9d10ff18ff79?count=25&offset=0]

using R and rvest.

I normally use XPath but I can't get it with SelectorGadget at this website.

Could somebody help me about an alternative way to get al pages with those data.

I wanted to have code similar yo this one and that worked with Investing. Please note that the Symbol, Name, and MarketCap codes are just examples:

library(rvest)
library(dplyr)

i=0
for(z in 1:80){
  
  url_base<-paste("https://finance.yahoo.com/screener/unsaved/38a77251-0996-439b-8be4-9d10ff18ff79?count=25&offset=0")
  zpg <- read_html(url_base)
  Symbol<-zpg %>% html_nodes("table") %>% html_nodes("span") %>% html_attr("data-id" )
  Name<-zpg %>% html_nodes("table") %>% html_nodes("span") %>% html_attr("data-name" )
  MarketCap<-zpg %>% html_nodes("table") %>% html_nodes("span") %>% html_attr("data-name" )
  data<-data.frame(WebID,FullName,MarketCap)
  
  if(i==0){
    USA<-data}else{
      USA<-rbind(USA,data)
    }
  i=i+1
}

Solution

  • You could try using quantmod or tidyquant.

    library(tidyverse)
    library(tidyquant)
    
    # getting symbols for NASDAQ
    nasdaq <- read_delim("https://nasdaqtrader.com/dynamic/SymDir/nasdaqlisted.txt", delim = "|")
    
    # scraping the data
    df <- nasdaq %>%
      head() %>% # to fetch only a few rows
      rowwise() %>%
      mutate(data = list(tq_get(Symbol, from = '2020-08-01', to = "2020-08-07", warnings = FALSE)))
    
    # getting the data ready
    df2 <- df$data %>%
      bind_rows()