Search code examples
rrseleniummap-function

Using purrr:map to loop through web pages for scraping with Rselenium


I have a basic R script which I have cobbled together using Rselenium which allows me to log into a website, once authenticated my script then goes to the first page of interest and pulls 3 pieces of text from the page.

Luckily for me the URL has been created in such a way that I can pass in a vector of numbers to the URL to take me to the next page of interest hence the use of map().

While on each page I want to scrape the same 3 elements off the page and store them in a master data frame for later analysis.

I wish to use the map family of functions so that I can become more familiar with them but I am really struggling to get these to work, could anyone kindly tell me where I am going wrong?

Here is the main part of my code (go to website and log in)

library(RSelenium)
# https://stackoverflow.com/questions/55201226/session-not-created-this-version-of-chromedriver-only-supports-chrome-version-7/56173984
rd <- rsDriver(browser = "chrome",
               chromever = "88.0.4324.27",
               port = netstat::free_port())

remdr <- rd[["client"]]

# url of the site's login page
url <- "https://www.myWebsite.com/"

# Navigating to the page
remdr$navigate(url)

# Wait 5 secs for the page to load
Sys.sleep(5)

# Find the initial login button to bring up the username and password fields
loginbutton <- remdr$findElement(using = 'css selector','.plain')

# Click the initial login button to bring up the username and password fields
loginbutton$clickElement()

# Find the username box
username <- remdr$findElement(using = 'css selector','#username')

# Find the password box
password <- remdr$findElement(using = 'css selector','#password')

# Find the final login button
login <- remdr$findElement(using = 'css selector','#btnLoginSubmit1')

# Input the username
username$sendKeysToElement(list("myUsername"))

# Input the password
password$sendKeysToElement(list("myPassword"))

# Click login
login$clickElement()

And hey presto we're in!

Now my code takes me to the initial page of interest (index = 1)

Above I mentioned that I am looking to increment through each page and I can do this by substituting an integer into the URL at the rcId element, see below

#remdr$navigate("https://myWebsite.com/rc_redesign/#/layout/jcard/drugCard?accountId=XXXXXX&rcId=1&searchType=R&reimbCode=&searchTerm=&searchTexts=*") # Navigating to the page

For each rcId in 1:9999 I wish to grab the following 3 elements and store them in a data frame

hcpcs_info <- remdr$findElement(using = 'class','is-jcard-heading')

hcpcs <- hcpcs_info$getElementText()[[1]]

hcpcs_description <- remdr$findElement(using = 'class','is-jcard-desc')

hcpcs_desc <- hcpcs_description$getElementText()[[1]]

tc_info <- remdr$findElement(using = 'css selector','.col-12.ng-star-inserted')

therapeutic_class <- tc_info$getElementText()[[1]]

I have tried creating a separate function and passing to map but I am not advance enough to piece this together, below is what I have tried.

my_function <- function(index) {
  remdr$navigate(sprintf("https://rc2.reimbursementcodes.com/rc_redesign/#/layout/jcard/drugCard?accountId=113479&rcId=%d&searchType=R&reimbCode=*&searchTerm=*&searchTexts=*",index)
                 Sys.sleep(5)
                 hcpcs_info[index] <- remdr$findElement(using = 'class','is-jcard-heading')
                 hcpcs[index] <- hcpcs_info$getElementText()[index][[1]])
}

x <- 1:10 %>% 
map(~ my_function(.x))

Any help would be greatly appreciated


Solution

  • Try the following :

    library(RSelenium)
    
    purrr::map_df(1:10, ~{
              remdr$navigate(sprintf("https://rc2.reimbursementcodes.com/rc_redesign/#/layout/jcard/drugCard?accountId=113479&rcId=%d&searchType=R&reimbCode=*&searchTerm=*&searchTexts=*",.x))
              Sys.sleep(5)
              hcpcs_info <- remdr$findElement(using = 'class','is-jcard-heading')
              hcpcs <- hcpcs_info$getElementText()[[1]]
              hcpcs_description <- remdr$findElement(using = 'class','is-jcard-desc')
              hcpcs_desc <- hcpcs_description$getElementText()[[1]]
              tc_info <- remdr$findElement(using = 'css selector','.col-12.ng-star-inserted')
              therapeutic_class <- tc_info$getElementText()[[1]]
              tibble(hcpcs, hcpcs_desc, therapeutic_class)
              }) -> result
    result