Search code examples
htmlrweb-scrapingrvestfinance

How to scraping table using html_table in R if there is no table tag?


I have been trying to scrape tables from yahoo.finance, when I inspect and find the required part, there is no table tag in the code. I am able to extract data with html_text function but it doesn't work with html_table function. Income Statement

[

link <- "https://finance.yahoo.com/quote/"
link <- paste0(link, tic[2], "/financials?p=", tic[2])
wahis.session <- html_session(link)
p <- wahis.session %>% 
        html_nodes(xpath = '//*[@id="Col1-1-Financials-Proxy"]/section/div[3]')

p <- html_table(p, header = F, trim = T, fill = T)

]2


Solution

  • The discussion at "[https://stackoverflow.com/questions/58315274/r-web-scraping-yahoo-finance-after-2019-change][1]" addresses your issue. Based on the discussion in the link, you can obtain the information as follows for "AAPL":

    library(rvest)
    library(tidyverse)
    
    tic <- "AAPL"
    link <- "https://finance.yahoo.com/quote/"
    link <- paste0(link, tic, "/financials?p=", tic)
    wahis.session <- html_session(link)
    p <- wahis.session 
    nodes <- p %>% html_nodes(".fi-row")
    
    df = NULL
    
    for(i in nodes){
      r <- list(i %>%html_nodes("[title],[data-test='fin-col']")%>%html_text())
      df <- rbind(df,as.data.frame(matrix(r[[1]], ncol = length(r[[1]]), byrow = TRUE), stringsAsFactors = FALSE))
    }
    
    matches <- str_match_all(p1%>%html_node('#Col1-1-Financials-Proxy')%>%html_text(),'\\d{1,2}/\\d{1,2}/\\d{4}')   
    headers <- c('Breakdown','TTM', matches[[1]][,1]) 
    names(df) <- headers