Search code examples
rweb-scrapingrvesthttr

Webscraping Returns xml_nodeset 0 in R


I am trying to scrape this website. https://web.tmxmoney.com/earnings.php?qm_symbol=DOL specifically the table at the bottom of the screen. I have tried countless of CSS and XPath but still get {xml_nodeset(0)}. I am looking for an intuitive answer rather than just the code.

Here are a few of my attempts.

library(httr)
library(rvest)
library(dplyr)

tbl = read_html('https://web.tmxmoney.com/earnings.php?qm_symbol=DOL')%>%
               html_nodes("table").[2]%>%html_table(fill = T)#no luck

tbl = read_html('https://web.tmxmoney.com/earnings.php?qm_symbol=DOL')%>%
               html_nodes(xpath = '//*[@id="DataTables_Table_0"]')%>%html_table(fill = T)#node set(0)

I have tried countless others, using selector gadget and inspecting the source code. enter image description here


Solution

  • I didn't check the complete Terms of Service, so please be aware that scraping might not be legal. The following should do the trick:

    library(rvest)
    library(data.table)
    library(httr)
    library(XML)
    library(RSelenium)
    
    mybrowser <- rsDriver(browser = 'firefox') 
    
    link <- "https://web.tmxmoney.com/earnings.php?qm_symbol=DOL"
    mybrowser$client$navigate(link)
    
    mybrowser$client$findElement(using = 'css selector', "#DataTables_Table_0")$getElementText()
    
    html.table <-  mybrowser$client$findElement(using = 'css selector', "#DataTables_Table_0") 
    webElem5txt <- html.table$getElementAttribute("outerHTML")[[1]]
    df.table <- read_html(webElem5txt) %>% html_table() %>% data.frame(.)
    
    mybrowser$server$stop()
    # Excerpt of the data:
     > df.table
    Var.1      Quarter.End        X..EPS.Actual X..EPS.Estimate X..Estimates X..Surprise X..Surprise.1       Date
    1     NA 2019-07-31 (Q2 2020)          0.45            0.47            3       -0.02        -4.26% 2019-09-12
    2     NA 2019-04-30 (Q1 2020)          0.33            0.33            4        0.00         0.00% 2019-06-13
    3     NA 2019-01-31 (Q4 2019)          0.54            0.55            4       -0.01        -1.82% 2019-03-28
    4     NA 2018-10-31 (Q3 2019)          0.41            0.42            3       -0.01        -2.38% 2018-12-06
    5     NA 2018-07-31 (Q2 2019)          0.43            0.44            3       -0.01        -2.27% 2018-09-13
    6     NA 2018-04-30 (Q1 2019)          0.31            0.31            3        0.00         0.00% 2018-06-07
    7     NA 2018-01-31 (Q4 2018)          0.48            0.47            4        0.01         2.13% 2018-03-29