Search code examples
rfinancegoogle-finance

error using R's htmltab: Error in `*tmp*`[[index]] : subscript out of bounds


Trying to download INX from Google Finance using

x <- htmltab(doc = "https://www.google.com/finance/historical?q=INDEXSP%3A.INX&ei=Qu-TWOn-AtW1mQGQ06WYCQ") 

and it gives this error:

Error in `*tmp*`[[index]] : subscript out of bounds

Solution

  • I couldn't get htmltab to work, but you can parse the web page using library(rvest), and specifying the particular xpath of the table

    library(rvest)
    
    url <- "https://www.google.com/finance/historical?q=INDEXSP%3A.INX&ei=Qu-TWOn-AtW1mQGQ06WYCQ"
    
    read_html(url) %>%
        html_node(xpath = "//*[@class='gf-table historical_price']") %>%
        html_table()
    
    #            Date     Open     High      Low    Close        Volume
    # 1   Feb 2, 2017 2,276.69 2,283.97 2,271.65 2,280.85 2,321,960,100
    # 2   Feb 1, 2017 2,285.59 2,289.14 2,272.44 2,279.55 2,478,979,663
    # 3  Jan 31, 2017 2,274.02 2,279.09 2,267.21 2,278.87 2,555,320,206
    # 4  Jan 30, 2017 2,286.01 2,286.01 2,268.04 2,280.90 2,108,083,825
    # ...