I’m trying to scrape this website using RVest: https://www.camara.cl/legislacion/sesiones_sala/sesiones_sala.aspx
Notice that the site loads quickly, but the data takes some time to appear. I realized that, while the content appears as html text in a web browser Inspector, the nodes appear empty when scraped using rvest
.
library(dplyr)
library(rvest)
camara <- "https://www.camara.cl/legislacion/sesiones_sala/sesiones_sala.aspx" %>%
session()
camara %>%
html_elements("h2")
camara %>%
html_elements(".box-proyecto")
camara %>%
html_elements("#trabajo-en-sala") %>%
html_elements("#info-tabs") %>%
html_elements("#ajax-container") %>%
html_elements("pnlTablaOrdinaria")
All of these should return at least some text content, but they appear empty.
I tried using V8
to interpret javascript according to these instructions, but the site appears to use JS only for interface elements, not for data retrieval.
I also tried to run it through PhantomJS
following these instructions, but couldn’t run the script due to permission issues.
It seems that I need to perform a GET request for the data, but the URL I found on the site’s code returns nothing: https://www.camara.cl/legislacion/sesiones_sala/tabla.aspx?_=1628291424652
I can’t use RSelenium as I’m working remotely through a headless server.
You need to pick up a session cookie (ASP.NET_SessionId) from the initial url. You could use session
for this, for example:
library(rvest)
library(magrittr)
r <- session('https://www.camara.cl/legislacion/sesiones_sala/sesiones_sala.aspx') %>%
session_jump_to('https://www.camara.cl/legislacion/sesiones_sala/tabla.aspx')
tables <- r %>% read_html() %>% html_table()