Getting data to download for large date range when parsing yahoo finance web address

I have a script that parses Yahoo Finance's historical pricing data for a vector of ticker symbols. It also uses the date codes in the url for the timeframe from 1/1/2014 to yesterday. No issues getting it to work, but I'm only getting the first 100 rows. It appears the problem is that Yahoo Finance (even with a large data range selected) will only show the first 100 results until you scroll down. Is there a work around?

You can see the issue going here...

#Example to test...
Ticker <- c("AMZN","F")
maxDate <- 1548918000

for (s in Ticker){
      url <- paste('https://finance.yahoo.com/quote/',s, '/history?period1=1388559600&period2=',maxDate,'&interval=1d&filter=history&frequency=1d',sep="")
       webpage <- readLines(url,warn=FALSE)
      html <- htmlTreeParse(webpage, useInternalNodes = TRUE, asText = TRUE)
       tableNodes <- getNodeSet(html, "//table")
      assign(s, readHTMLTable(tableNodes[[1]],

header=c("Date","Open","High","Low","Close","Adj. Close","Volume")))
      df <- get(s)
      df['Symbol'] <- s
      assign(s, df)
 }

tickerDataList <- cbind(mget(Ticker))
tickerData <- do.call(rbind, tickerDataList)

The expected results would be the same but with a date range back to 1/1/14. This would mean there would be a couple thousand rows vs. two-hundred.

Solution

We may utilize what this answer proposes. For instance,

library(RSelenium)
library(rvest)
rD <- rsDriver()
remDr <- rD[["client"]]
remDr$navigate("https://finance.yahoo.com/quote/AMZN/history?period1=1388559600&period2=1548918000&interval=1d&filter=history&frequency=1d")

for(i in 1:5){      
  remDr$executeScript(paste("scroll(0,", i * 10000,");"))
  Sys.sleep(3)    
}

page_source <- remDr$getPageSource()
out <- read_html(page_source[[1]]) %>% html_nodes("table") %>% html_table()
nrow(out[[1]])
# [1] 801

801 lines is still not all you need, but scrolling more times than 5 (and perhaps increasing 10000) would ultimately give you the result.