R Code: Scrape ETF summary stats from Yahoo Finance

I seek to scrape ETF summary stats from Yahoo finance. For example, the page link is Below the graph, is the table to scrape and the key fields are NAV, PE Ratio TTM, yield, beta and expense ratio. I previously used the rvest package as follows, but that is no longer working as the page structure has changed

ticker <- "IVV"
url <- paste0("",ticker)
df <- url %>%
      read_html() %>%
      html_table() %>%
      map_df(bind_cols) %>%

Any help appreciated


  • It looks like there is no longer a table element in that link, as the info you're after is now contained in list elements. I have tweaked the code to capture the label and values from each list element.

    ticker <- "IVV"
    url <- paste0("",ticker)
    ivv_html <- read_html(url)
    node_txt <- ".svelte-tx3nkj" # This contains "table" info of interest
    df <- ivv_html %>% 
      html_nodes(paste0(".container", node_txt)) %>%
          label = html_nodes(.x, paste0(".label", node_txt)) %>% 
            html_text(trim = TRUE)
          ,value = html_nodes(.x, paste0(".value", node_txt)) %>% 
            html_text(trim = TRUE)
    df %>% 
      filter(label %in% c("NAV", "PE Ratio (TTM)", "Yield", "Beta (5Y Monthly)", "Expense Ratio (net)"))
    # A tibble: 5 × 2
      label               value 
      <chr>               <chr> 
    1 NAV                 519.85
    2 PE Ratio (TTM)      26.22 
    3 Yield               1.37% 
    4 Beta (5Y Monthly)   1.00  
    5 Expense Ratio (net) 0.03% 

    Adding .container class will limit the info you're after to just the "table" located under the chart, otherwise all info tagged with the class .svelte-tx3nkj from that page will be extracted.

    UPD 2024-08-23, following HTML structure change:

    node_txt <- "yf-tx3nkj"
    ivv_html %>% 
      html_nodes(paste0("ul.", node_txt)) %>% 
      html_nodes(paste0(".", node_txt)) %>% 
          label = html_nodes(.x, paste0(".label.", node_txt)) %>% 
            html_text(trim = TRUE)
          ,value = html_nodes(.x, paste0(".value.", node_txt)) %>%
            html_text(trim = TRUE)
      }) %>% 
    df %>% 
      filter(label %in% c("NAV", "PE Ratio (TTM)", "Yield", "Beta (5Y Monthly)", "Expense Ratio (net)"))