Search code examples
rweb-scraping

WebScraping dynamic pages in R


I will change the website, to make this question better. Still facing similar issues, that can't use only rvest package and maybe answer will be easier to obtain with RSelenium. Website: http://ravimaailma.fi/cg/tulokset/20/ and I want to obtain links from the main article which would direct me to individual race results. Links look something like this: http://ravimaailma.fi/article/tulokset/pori-18-11-2017-tulokset/8718/

I'm trying to use simple Rvest as thought that would be all needed here. SelectorGadget is giving links CSS as .article-title a, so my code is simply

url %>%
  read_html() %>% 
  html_nodes(".article-title a") %>% 
  html_text()

This will return nothing. Website loads more results when you scroll down, but I thought I would atleast get first results out. Below gives out some links and links 28:32 looks promising, but I think they are links from the sidebar, not from article.

url %>%
  read_html() %>% 
  html_nodes("a") %>% 
  html_attr("href")

What I'm I doing wrong here and can RSelenium help me?


Solution

  • Here is my partial answer, still not getting all, but maybe helps some one. Code will return 1 link for first result. Not sure why it isn't giving them all. I'm using

    library(RSelenium)
    rD <- rsDriver(port = 4444L,  browser = "chrome")
    
    remDr <- rD[["client"]]
    remDr$navigate("http://ravimaailma.fi/cg/tulokset/20/")
    
    elem <- remDr$findElement(using="css selector", value=".article-title a")
    elemtxt <- elem$getElementAttribute("href")
    
    #Click button to load more results
    #button <- remDr$findElement(using="id", value="loadmore")
    #button$clickElement()
    
    remDr$close()
    

    I haven't used button click yet, but seemed that it was working as well. Only problem is that I can't get all results from the site.