How do I scrape information from an `iframe` in R?

I trying to scrape information from this website: https://www.cps.edu/schools/schoolprofiles/acero-santiago

In particular, I want to scrape the Supportive School designation found in the Reports tab. See below for an example:

I want to grab the "Established" text.

Here is my code so far:

library(rvest)

url <- "https://www.cps.edu/schools/schoolprofiles/acero-santiago"
page <- read_html(url)

# select the iframe element
iframe <- page %>% html_element("iframe")

iframe_src <- html_attr(iframe, "src")

iframe_page <- read_html(paste("https://www.cps.edu",iframe_src,sep=""))

But once I get here I still can't find a node to select. Furthermore, I'm still unable to scrape any information from the page. See an example here:

data <- iframe_page %>% html_node("h4") %>% html_text()

I don't get any results.

Any thoughts?

Solution

If you search for that phrase - "This school has put in place systems" - on network tab of your browser's dev.tools, you'll find SchoolProgressReport API endpoint:

library(dplyr)
library(rvest)
library(stringr)

url <- "https://www.cps.edu/schools/schoolprofiles/acero-santiago"

progress_report <- read_html(url) %>% 
  html_element("iframe.iframe-page") %>% 
  html_attr("src") %>% 
  str_extract("SchoolId=\\d+$") %>% 
  paste0("https://www.cps.edu/api/schoolprofile/SchoolProgressReport?", .) %>% 
  jsonlite::read_json(simplifyVector = TRUE) %>% 
  tidyr::pivot_longer(everything())

progress_report %>% 
  filter(str_detect(name, fixed("supportive_School_Award")))
#> # A tibble: 2 × 2
#>   name                         value                                            
#>   <chr>                        <chr>                                            
#> 1 supportive_School_Award      ESTABLISHED                                      
#> 2 supportive_School_Award_Desc This school has put in place systems and structu…

^{Created on 2023-02-22 with reprex v2.0.2}