Search code examples
javarrvestrselenium

RSelenium or Javascript: Expand all java expandable element on a website


I am working on a project right now that requires me to scrape information from this website:

I have already managed to scrape the table with RSelenium and Rvest. But there are some details I would like to add to the dataframe, which can be found in an expandable java (?) object. I have illustrated the object here:

Tables

Essentially, I need to expand ALL of them before scraping in order to include them. Is there an easy way to do this with a code? Yesterday I had a script that clicked them all manually, which took hours to complete.

Is it possible to inject a code on the website that expands them all, or have RSelenium execute a code?


Solution

  • The below script will allow you to scrape the web page and select the relevant dates and the court. Modify "Query" with the date range and the court you need. I picked the first court in the list. You can find the court codes by inspecting the page and searching for FormData.Court.

        library(tidyverse)
        library(httr)
        library(rvest)
        
        #################################
        ## FUNCTION TO PROCESS DATA    ##
        #################################
        
        parseDetails <- function(pg_dtls){
          
          dt <- pg_dtls %>%
            html_elements('dt') %>%
            html_text()
          
          dd <- pg_dtls %>%
            html_elements('dd') %>%
            html_text()
          
          tbl <- pg_dtls %>%
            html_element('table') %>%
            html_table()
          
          df <- as.data.frame(dd, row.names = dt) %>% set_names("Details")
          
          rtnList <- list(mainData = df, otherData = tbl)
        }
        
        ###############################
        ##  SCRAPE WEB PAGE          ##
        ###############################
        
        url <- "https://www.domstol.no/enkelt-domstol/hoyesterett/saksliste/berammingsliste/"
        
        query <- list('FormData.From'="25.02.2022",
                      'FormData.To'="07.03.2022",                  
    'FormData.Court'="AAAA2104220835148622091WAFLAU#EJBOrgUnit")
        response <- POST(url, body = query)
        dtls <- content(response, "parsed") %>%
          html_elements("button") %>% 
          html_attr("data-action") %>%
          na.omit() %>%
          paste0("https://www.domstol.no/", .) %>%
          map(read_html)
        
        
        scrapedData <- map(dtls, parseDetails)