Search code examples
rweb-scrapingrvesthidden-field

rvest handling hidden text


I don't see the data/text I am looking for when scraping a web page

I tried googling the issue without having any luck. I also tried using the xpath but i get {xml_nodeset (0)}

require(rvest)
url <- "https://www.nasdaq.com/market-activity/ipos"
IPOS <- read_html(url)
IPOS %>% xml_nodes("tbody") %>% xml_text()

Output:

[1] "\n            \n          \n          \n            \n          \n        "

I do not see any of the IPO data. Expected output should contain the table for the "Priced" IPOs: Symbol, Company Name, etc...

enter image description here


Solution

  • It seems that the table data are loaded by scripts. You can use RSelenium package to get them.

    library(rvest)
    library(RSelenium)
    
    rD <- rsDriver(port = 1210L, browser = "firefox", check = FALSE)
    remDr <- rD$client
    
    url <- "https://www.nasdaq.com/market-activity/ipos"
    remDr$navigate(url)
    
    IPOS <- remDr$getPageSource()[[1]] %>% 
      read_html() %>% 
      html_table(fill = TRUE)
    
    str(IPOS)
    
    PRICED <- IPOS[[3]]