Search code examples
rweb-scrapingrvest

Using Rvest to download a calendar information which changes depending on the inputted date


I am trying to use the rvest package to scrape the release calendar from this website https://www.cso.ie/en/csolatestnews/releasecalendar/

By default the dates show up for the coming 7 days which is just what im looking for however when I use the read_html function from rvest it doesn't appear to pickup the defaults as text and as a result I'm finding it difficult to extract the information.

Any help here would be great.

library(rvest)
library(dplyr)
library(xml2)

url <- read_html("https://www.cso.ie/en/csolatestnews/releasecalendar")
test<-url %>% html_nodes('td')

Solution

  • library(tidyverse)
    library(httr2)
    
    "https://cdn.cso.ie/static/data/ReleaseCalendar.json" %>% 
      request() %>%  
      req_perform() %>%  
      resp_body_json(simplifyVector = TRUE) %>% 
      pluck("releases") %>% 
      as_tibble()
    
    # A tibble: 220 × 10
       dateindex releasedate dayname   title         refpe…¹ status sector subse…² subse…³ comment
           <int> <chr>       <chr>     <chr>         <chr>   <chr>  <chr>  <chr>   <chr>   <chr>  
     1      5325 20/03/2023  Monday    "Fuel Excise… Januar… Confi… Envir… Energy  https:… ""     
     2      5185 21/03/2023  Tuesday   "Transport B… March … Confi… Busin… Transp… http:/… ""     
     3      5066 22/03/2023  Wednesday "Wholesale P… Februa… Confi… Econo… Prices  http:/… ""     
     4      5475 22/03/2023  Wednesday "Environment… 2020    Confi… Envir… Enviro… http:/… ""     
     5      5475 22/03/2023  Wednesday "Environment… 2020    Confi… Envir… Enviro… https:… ""     
     6      5476 24/03/2023  Friday    "COVID-19 Va… Series… Confi… Peopl… Health  http:/… ""     
     7      5150 24/03/2023  Friday    "Vital Stati… Quarte… Confi… Peopl… Births… http:/… ""     
     8      5151 28/03/2023  Tuesday   "Livestock S… Februa… Confi… Busin… Agricu… http:/… ""     
     9      5068 28/03/2023  Tuesday   "Retail Sale… Februa… Confi… Busin… Servic… http:/… ""     
    10      5152 29/03/2023  Wednesday "Crops and L… 2022    Confi… Busin… Agricu… http:/… ""     
    # … with 210 more rows, and abbreviated variable names ¹​refperiod, ²​subsector, ³​subsectorURL
    # ℹ Use `print(n = ...)` to see more rows