This page almost seems infinite, as it shows over 6.000 profiles in a dynamically loading page.
A similar one only shows 310 profiles, so scrolling to its end does not require so much time.
Is there a way to write a single code that could scrape both pages by scrolling to the end?
For a similar purpose, I used a code with RSelenium
like this:
journal_url <- "https://www.frontiersin.org/journals/photonics#editorial-board"
rD <- RSelenium::rsDriver(browser="chrome", port=4546L, verbose=F, chromever="87.0.4280.20")
for(i in 1:5){
remDr$executeScript(paste("scroll(0,",i*10000,");"))
Sys.sleep(3)
}
But in the present case, while scrolling five times as in for(i in 1:5)
may perhaps suffice for the second page (with 350 profiles), it will not be sufficient for the first one (with 6.000 profiles). If someone could point me to a single code that could handle pages of varying sizes, I would be very grateful!
I believe you'll find your answer here,
under the subheader "Scroll Down Until the End(Not Recommended if There Are too Many Pages)."
Edit: Here's the suggested code from the link.
element <- driver$findElement("css", "body")
flag <- TRUE
counter <- 0
n <- 5
while(flag){
counter <- counter + 1
#compare the pagesource every n(n=5) time, since sometimes one scroll down doesn't render new content
for(i in 1:n){
element$sendKeysToElement(list("key"="page_down"))
Sys.sleep(2)
}
if(exists("pagesource")){
if(pagesource == driver$getPageSource()[[1]]){
flag <- FALSE
writeLines(paste0("Scrolled down ",n*counter," times.\n"))
} else {
pagesource <- driver$getPageSource()[[1]]
}
} else {
pagesource <- driver$getPageSource()[[1]]
}
}