I am working on a project right now that requires me to scrape information from this website:
I have already managed to scrape the table with RSelenium and Rvest. But there are some details I would like to add to the dataframe, which can be found in an expandable java (?) object. I have illustrated the object here:
Essentially, I need to expand ALL of them before scraping in order to include them. Is there an easy way to do this with a code? Yesterday I had a script that clicked them all manually, which took hours to complete.
Is it possible to inject a code on the website that expands them all, or have RSelenium execute a code?
The below script will allow you to scrape the web page and select the relevant dates and the court. Modify "Query" with the date range and the court you need. I picked the first court in the list. You can find the court codes by inspecting the page and searching for FormData.Court.
library(tidyverse)
library(httr)
library(rvest)
#################################
## FUNCTION TO PROCESS DATA ##
#################################
parseDetails <- function(pg_dtls){
dt <- pg_dtls %>%
html_elements('dt') %>%
html_text()
dd <- pg_dtls %>%
html_elements('dd') %>%
html_text()
tbl <- pg_dtls %>%
html_element('table') %>%
html_table()
df <- as.data.frame(dd, row.names = dt) %>% set_names("Details")
rtnList <- list(mainData = df, otherData = tbl)
}
###############################
## SCRAPE WEB PAGE ##
###############################
url <- "https://www.domstol.no/enkelt-domstol/hoyesterett/saksliste/berammingsliste/"
query <- list('FormData.From'="25.02.2022",
'FormData.To'="07.03.2022",
'FormData.Court'="AAAA2104220835148622091WAFLAU#EJBOrgUnit")
response <- POST(url, body = query)
dtls <- content(response, "parsed") %>%
html_elements("button") %>%
html_attr("data-action") %>%
na.omit() %>%
paste0("https://www.domstol.no/", .) %>%
map(read_html)
scrapedData <- map(dtls, parseDetails)