Search code examples
htmlrweb-scrapingrvestrselenium

Scrape data HTML table in iframe using R


I am trying to scrape the following table from this link:

https://www.price.moc.go.th/en/home_en

Using rvest the following doesn't work, but I don't manage to figure out what to change:

Read html code

html <- read_html("https://www.price.moc.go.th/en/home_en")

Go into specific css selector

check <- html %>%
  html_nodes("#cpi_index > td:nth-child(2)") %>%
  html_text()

enter image description here


Solution

  • The required table is in iframe which has a link,

    https://www.price.moc.go.th/price_index/index_price01.html

    enter image description here

    You need RSelenium to get the table.

    url = 'https://www.price.moc.go.th/price_index/index_price01.html'
    #start the browser
    library(RSelenium)
    library(rvest)
    library(dplyr)
    driver = rsDriver(browser = c("firefox"))
    remDr <- driver[["client"]]
    remDr$navigate(url)
    #get table
    df = remDr$getPageSource()[[1]] %>% 
      read_html() %>%
      html_table() 
    
    df[[1]]
    # A tibble: 4 x 5
      INDEX    `Nov  21` `M/M` `Y/Y` `A/A`
      <chr>        <dbl> <dbl> <dbl> <dbl>
    1 CPI           102.  0.28  2.71  1.15
    2 CORE-CPI      101.  0.09  0.29  0.23
    3 PPI           106.  1.2   8.5   4.4 
    4 CMI           116   0.5  10.4   7.9