I'm facing an unusual problem when using RSelenium to navigate to a webpage containing quotation marks in the URL. The content does not load when I attempt to access the page directly. However, if I first visit an incorrect URL and then navigate to the desired page, the content loads successfully. Only version 5 of the code below seems to work as intended.
How can I directly access the correct page without visiting an incorrect URL first?
Load Selenium
rD <- RSelenium::rsDriver(browser = "chrome", port = 4444L, chromever = NULL, verbose = FALSE)
remDr <- rD[["client"]]
Version 1 - vanilla attempt (failed to load content)
url <- "https://www.sec.gov/edgar/search/#/q=%2522apple%2522&dateRange=custom&startdt=2017-01-01&enddt=2017-12-31"
remDr$navigate(url)
Version 2 - try again with URLencode
(failed to load content)
url <- "https://www.sec.gov/edgar/search/#/q=%2522apple%2522&dateRange=custom&startdt=2017-01-01&enddt=2017-12-31"
url <- URLencode(url, reserved = TRUE)
remDr$navigate(url)
Version 3 - tried escaping quotation marks (failed to load content)
url <- 'https://www.sec.gov/edgar/search/#/q=\"apple\"&dateRange=custom&startdt=2017-01-01&enddt=2017-12-31'
remDr$navigate(url)
Version 4 - Attempted using double backslashes instead of single ones (partially successful, but loaded incorrect content)
url <- 'https://www.sec.gov/edgar/search/#/q=\\"apple\\"&dateRange=custom&startdt=2017-01-01&enddt=2017-12-31'
remDr$navigate(url)
Version 5 - first load incorrect content with version 4, then try again and content loads correctly (success)
# step 1 load incorrect content
url <- 'https://www.sec.gov/edgar/search/#/q=\\"apple\\"&dateRange=custom&startdt=2017-01-01&enddt=2017-12-31'
remDr$navigate(url)
# step 2 now content loads correctly for some reason
url <- 'https://www.sec.gov/edgar/search/#/q="apple"&dateRange=custom&startdt=2017-01-01&enddt=2017-12-31'
remDr$navigate(url)
Edit
I added another failed version here:
Version 6 - tried with javascript execution (failed to load content)
url <- 'https://www.sec.gov/edgar/search/#/q="apple"&dateRange=custom&startdt=2017-01-01&enddt=2017-12-31'
remDr$executeScript(paste("window.location.href='", url, "';", sep=""))
Additionally, it's important to mention that the content in version 4 is incorrect. When you click on any data in version 4, the term 'apple' is not highlighted in the documents. However, in version 5, 'apple' is highlighted when you click the documents. Furthermore, the search form in version 4 includes backslashes, while in version 5, it only contains quotation marks.
I tried the following code and it seems to give the same result as your proposed approach #5
library(RSelenium)
library(rvest)
url <- "https://www.sec.gov/edgar/search/#/q=%2522apple%2522&dateRange=custom&startdt=2017-01-01&enddt=2017-12-31"
shell('docker run -d -p 4446:4444 selenium/standalone-firefox')
remDr <- remoteDriver(remoteServerAddr = "localhost", port = 4446L, browserName = "firefox")
remDr$open()
remDr$navigate(url)