Search code examples
javascriptrweb-scrapingrvesthttr

web scraping of a dynamic table


I want to scrape data from the table on this page

But both GET from httr or read_html from rvest cannot read the table. I've check the structure of this webpage and cannot find any POST or GET request about fetching data when loading the webpage.


Solution

  • From the page source we can see that the table is embedded in a frame. The URL for the table itself is at this link.

    So you can try:

    u <- "http://datacenter.mep.gov.cn:8099/ths-report/report!list.action?xmlname=1466632112484&V_YEAR=2016&V_waterplace=%27%E5%90%89%E6%9E%97%E6%BA%AA%E6%B5%AA%E5%8F%A3%27"
    
    mytable <- u %>%
      read_html() %>%
      html_node("table") %>%
      html_table()
    

    then some cleaning up to deal with the non-English characters.