I trying to scrape information from this website: https://www.cps.edu/schools/schoolprofiles/acero-santiago
In particular, I want to scrape the Supportive School designation found in the Reports tab. See below for an example:
I want to grab the "Established" text.
Here is my code so far:
url <- "https://www.cps.edu/schools/schoolprofiles/acero-santiago"
page <- read_html(url)
# select the iframe element
iframe <- page %>% html_element("iframe")
iframe_src <- html_attr(iframe, "src")
iframe_page <- read_html(paste("https://www.cps.edu",iframe_src,sep=""))
But once I get here I still can't find a node to select. Furthermore, I'm still unable to scrape any information from the page. See an example here:
data <- iframe_page %>% html_node("h4") %>% html_text()
I don't get any results.
Any thoughts?
If you search for that phrase - "This school has put in place systems" - on network tab of your browser's dev.tools, you'll find SchoolProgressReport
API endpoint:
url <- "https://www.cps.edu/schools/schoolprofiles/acero-santiago"
progress_report <- read_html(url) %>%
html_element("iframe.iframe-page") %>%
html_attr("src") %>%
str_extract("SchoolId=\\d+$") %>%
paste0("https://www.cps.edu/api/schoolprofile/SchoolProgressReport?", .) %>%
jsonlite::read_json(simplifyVector = TRUE) %>%
progress_report %>%
filter(str_detect(name, fixed("supportive_School_Award")))
#> # A tibble: 2 × 2
#> name value
#> <chr> <chr>
#> 1 supportive_School_Award ESTABLISHED
#> 2 supportive_School_Award_Desc This school has put in place systems and structu…
Created on 2023-02-22 with reprex v2.0.2