Search code examples
javascriptrweb-scrapingrvest

Scraping a website with form and JS using R


I'm tryng to scrape a website that have a form and that generates the information I want from JS (I guess).

This is the website: https://www.distancecalculator.net/ , it calculates the distance between cities.

For instance, I want to calculate the distance between these two cities:

  • Craíbas - AL, Brasil
  • Maceió - AL, Brasil

It seems that, even though I'm using the POST to fill the form, my scraper is still collecting the data available from before clicking the "Calculate" button. What am I doing wrong?

Here's my code:

library(httr)
library(rvest)

url <- "https://www.distancecalculator.net/"

fd <- list(
  submit = "Calculate Distance",
  "originCity"  = "Craíbas - AL, Brasil",
  "destinationCity" = "Maceió - AL, Brasil"
)

resp<-POST(url, body=fd, encode="form")
conte <- content(resp)
conte

tex <- conte %>% html_nodes(xpath = '//span[@id="driving-distance-km"]/text()') %>% html_text()
tex

Solution

  • I agree with the comment that RSeleneium would be the best for this. Here is your desired result using RSelenium.

    library(RSelenium)
    
    url <- "https://www.distancecalculator.net/"
    
    #Start Selenium
    rD <- rsDriver(port = 4444L, browser = "chrome")
    remDr <- rD$client 
    remDr$navigate(url)
    
    #Type in the information
    originCity <- remDr$findElement(using = "css", "#originCity")
    originCity$sendKeysToElement(list("Craíbas - AL, Brasil"))
    #Click the first item
    clickFirst <- remDr$findElements(using = "css", ".pac-item")
    clickFirst2 <- unlist(lapply(clickFirst, function(x){
      x$getElementText()
    }))
    clickFirst2
    click <- clickFirst[[which(clickFirst2 == clickFirst2[1])]]
    click$clickElement()
    
    destinationCity <- remDr$findElement(using = "css", "#destinationCity")
    destinationCity$sendKeysToElement(list("Maceió - AL, Brasil"))
    #Click the first item
    clickFirst <- remDr$findElements(using = "css", ".pac-item")
    clickFirst2 <- unlist(lapply(clickFirst, function(x){
      x$getElementText()
    }))
    clickFirst2
    click <- clickFirst[[which(clickFirst2 == clickFirst2[1])]]
    click$clickElement()
    
    #No longer Necessary 
    calculate <- remDr$findElements(using = "xpath", '//*[contains(concat( " ", 
    @class, " " ), concat( " ", "button", " " ))]')
    calculate2 <- unlist(lapply(calculate, function(x){
       x$getElementText()
     }))
    calculate2
    click <- calculate[[which(calculate2 == calculate2[1])]]
    click$clickElement()
    
    #Scrape the result
    dist <- remDr$findElements(using = "css", "#driving-distance-km")
    dist <- unlist(lapply(dist, function(x){
      x$getElementText()
    }))
    dist
    remDr$close()
    

    And a link to the RSelenium package information: https://cran.r-project.org/web/packages/RSelenium/vignettes/RSelenium-basics.html