I need to extract articles from this website including title, date and URL.
https://en.news-front.info/category/ukraine-2/
I'm using the rvest
package but I'm having difficulty extracting them due to the presence of the "show more" button that loads the other articles. How do I go about doing this? I need the articles through March 2021.
Thank you
this is the correct solution for extracting articles with the button "show more"
library(RSelenium)
rD1 <- rsDriver(browser = "chrome", port = 4567L, geckover = NULL,
chromever = "99.0.4844.51", iedrver = NULL,
phantomver = NULL)
remDr1 <- rD1[["client"]]
remDr1$navigate("https://en.news-front.info/category/ukraine-2/")
webElem <- remDr1$findElement(using = 'css selector', ".btn-load-more")
webElem$clickElement()
replicate(50,
{
# find button
morereviews <- remDr1$findElement(using = 'css selector', ".btn-load-more")
# click button
morereviews$clickElement()
# wait
Sys.sleep(2)
})
# Scrap the reviews
title <- xml2::read_html(remDr1$getPageSource()[[1]])%>%
rvest::html_nodes(".article-link__title") %>%
rvest::html_text() %>%
dplyr::data_frame(title = .)
title