Search code examples
rweb-scrapingrvestrselenium

web scraping with a button "show more"


I need to extract articles from this website including title, date and URL.

https://en.news-front.info/category/ukraine-2/

I'm using the rvest package but I'm having difficulty extracting them due to the presence of the "show more" button that loads the other articles. How do I go about doing this? I need the articles through March 2021.

Thank you


Solution

  • this is the correct solution for extracting articles with the button "show more"

    library(RSelenium)
    rD1 <- rsDriver(browser = "chrome", port = 4567L, geckover = NULL, 
                chromever =  "99.0.4844.51", iedrver = NULL, 
                phantomver = NULL)
    remDr1 <- rD1[["client"]] 
    remDr1$navigate("https://en.news-front.info/category/ukraine-2/")
    
    webElem <- remDr1$findElement(using = 'css selector', ".btn-load-more")
    webElem$clickElement()
    
    replicate(50,
          {
            # find button
            morereviews <- remDr1$findElement(using = 'css selector', ".btn-load-more")
            # click button
            morereviews$clickElement()
            # wait
            Sys.sleep(2)
          })
    
    # Scrap the reviews
    title <- xml2::read_html(remDr1$getPageSource()[[1]])%>%
    rvest::html_nodes(".article-link__title") %>%
    rvest::html_text() %>%
    dplyr::data_frame(title = .)
    title