Search code examples
rweb-scrapingrvestrselenium

How to get google geographic coordinates on a page with "div" tag from "tablist" class with RSelenium r package


I'm trying to get the geographic coordinates of an html page using functions from RSelenium package of the R software. The goal is to get the values ​​20º27'36.1"S 54º38'03.1"W. Follow the code with the attempts. I'm grateful for any help.

library(rvest)
library(RSelenium)
library(httpuv)

port <- httpuv::randomPort()

rD <- rsDriver(browser = c("firefox"),
               verbose=TRUE,
               check = FALSE,
               port = port)

driver <- rD[["client"]]

urll <- "https://www.zapimoveis.com.br/lancamento/venda-apartamento-2-quartos-bairro-seminario-campo-grande-ms-46m2-id-2600496487/"
driver$navigate(urll)

politicas <- driver$findElement(using = "css",
                                value = "button.cookie-notifier__cta")
politicas$clickElement()

botaomapa <- driver$findElement(using = "xpath", "/html/body/main/div[1]/section/section/section[1]/button[2]")
botaomapa$clickElement()

#Attempt 1: using xpath from coordinates
coord <- driver$findElement(using="xpath", "/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#errorrrrr

#Attempt 2: by botaomapa object
coord <- botaomapa$findElement(using="xpath", "/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#errorrrr

#Attempt 3: by rvest package
readmap <- read_html(urll)
auxiliar <- readmap %>% html_elements("section")
auxiliar2 <- auxiliar%>%html_elements("#listing-map")
c1 <- readmap%>%html_nodes(xpath="/html/body/main/div[1]/section/section/section[2]/div/div[3]/article/iframe")#nothing
c2 <- auxiliar2%>%html_nodes(xpath="/html/body/main/div[1]/section/section/section[2]/div/div[3]/article/iframe")#nothing
c3 <- auxiliar2%>%html_nodes(xpath="/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#nothing

Solution

  • The tricky bit is that the map is contained in an iframe and it's hard to access anything inside the iframe. It does look like you can find the iframe and its attributes though! The link contained in the iframe's src= attribute contains the coordinates, so you can extract the iframe link and then the coordinates from it.

    After this step in your original code:

    politicas$clickElement()
    

    I did this:

    library(stringr)
    library(rvest)
    
    # pull the webpage html
    html <- driver$getPageSource()[[1]]
    
    
    
    # look for the iframe's node
    # then pull the source attribute
    map_link <- html %>% 
      read_html() %>% 
      html_node(".map-embed__iframe") %>%
      html_attr("src")
    
    

    here's what the link looks like:

    map_link
    [1] "https://www.google.com/maps/embed/v1/place?key=AIzaSyB1BH90qSMLRWrSEKe8D7fml7-kWHN2qjY&q=-20.460039,-54.634191"
    

    Then you can use regular expressions or whatever to extract the coordinates

    #remove everything before q=
    
    map_link %>% str_remove(".*q=")
    [1] "-20.460039,-54.634191"
    

    Here's what I saw when I put those coordinates in google, so it looks like it's the same as the original map: extracted coordinates