I'm trying to access files from this website: https://public.education.mn.gov/MDEAnalytics/DataTopic.jsp?TOPICID=11 I want the level to correspond to county, and I want to do it for each year. For the sake of this example, assume I only want to do it for 2022. I got RSelenium up and running, but everything I've tried to find the select menu elements with RSelenium hasn't worked.
For instance:
remDr <- remote_driver$client
remDr$open()
remDr$navigate("https://public.education.mn.gov/MDEAnalytics/DataTopic.jsp?TOPICID=11")
data_table <- remDr$findElement(using = 'id', value = "cmbCOLuMN")
returns an error: "An element could not be located on the page using the given search parameters".
I've tried to change the using and values parameters in findElement()
, and still no such luck. I would be grateful for any insight into how to select the level to be County and the year to be 2022.
Update: I was able to make more progress with this code based on a previous stackoverflow response that talks about iframes, but am still getting stuck at the end:
remDr$open()
remDr$navigate("https://public.education.mn.gov/MDEAnalytics/DataTopic.jsp?TOPICID=11")
frames <- remDr$findElements('css', "iframe")
remDr$switchToFrame(frames[[1]])
selectElem <- remDr$findElement("id", "cmbCOLUMN1")
selectOpt <- selectElem$selectTag()
I'm not able to use selectOpt feature to choose the value I want, which would be something like SelectOpt$text$County
It looks like there's a second iframe. I ran your code through this:
frames <- remDr$findElements('css', "iframe")
remDr$switchToFrame(frames[[1]])
Then clicked the "list files" button using its html id.
# click on a button ------------------------------------
remDr$findElement(using = "id",value = "button1")$clickElement()
Clicking on that button shows all the available files in a second iframe, so I found the second iframe with its html id (#report
) and switched to it.
# switch to iframe 2 ------------------------------------
report_frame <- remDr$findElement(using = "id",value = "report")
remDr$switchToFrame(report_frame)
Then I pulled the page's html and scaned it for tables
# Pull page html
page_html <- remDr$getPageSource()[[1]] %>%
read_html()
# extract tables
tables <- page_html %>% html_table()
files_table <- tables[[2]]
I'm assuming this is what you wanted? A data frame with a list of all the available files:
# A tibble: 1,458 × 6
X1 X2 X3 X4 X5 X6
<chr> <chr> <chr> <chr> <chr> <chr>
1 "" "" "" "" "" ""
2 "Level" "Name" "Year" "Document" "Data… "Hel…
3 "County" "Aitkin" "2022" "2022 Minnesota Student Survey County" "pdf" ""
4 "County" "Aitkin" "2019" "2019 Minnesota Student Survey County" "pdf" ""
5 "County" "Aitkin" "2016" "2016 Minnesota Student Survey County" "pdf" ""
6 "County" "Aitkin" "2013" "2013 Minnesota Student Survey County" "pdf" ""
7 "County" "Anoka" "2022" "2022 Minnesota Student Survey County" "pdf" ""
8 "County" "Anoka" "2019" "2019 Minnesota Student Survey County" "pdf" ""
9 "County" "Anoka" "2016" "2016 Minnesota Student Survey County" "pdf" ""
10 "County" "Anoka" "2013" "2013 Minnesota Student Survey County" "pdf" ""
# … with 1,448 more rows
# ℹ Use `print(n = ...)` to see more rows