I'm trying to get the geographic coordinates of an html page using functions from RSelenium
package of the R
software. The goal is to get the values 20º27'36.1"S 54º38'03.1"W. Follow the code with the attempts. I'm grateful for any help.
library(rvest)
library(RSelenium)
library(httpuv)
port <- httpuv::randomPort()
rD <- rsDriver(browser = c("firefox"),
verbose=TRUE,
check = FALSE,
port = port)
driver <- rD[["client"]]
urll <- "https://www.zapimoveis.com.br/lancamento/venda-apartamento-2-quartos-bairro-seminario-campo-grande-ms-46m2-id-2600496487/"
driver$navigate(urll)
politicas <- driver$findElement(using = "css",
value = "button.cookie-notifier__cta")
politicas$clickElement()
botaomapa <- driver$findElement(using = "xpath", "/html/body/main/div[1]/section/section/section[1]/button[2]")
botaomapa$clickElement()
#Attempt 1: using xpath from coordinates
coord <- driver$findElement(using="xpath", "/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#errorrrrr
#Attempt 2: by botaomapa object
coord <- botaomapa$findElement(using="xpath", "/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#errorrrr
#Attempt 3: by rvest package
readmap <- read_html(urll)
auxiliar <- readmap %>% html_elements("section")
auxiliar2 <- auxiliar%>%html_elements("#listing-map")
c1 <- readmap%>%html_nodes(xpath="/html/body/main/div[1]/section/section/section[2]/div/div[3]/article/iframe")#nothing
c2 <- auxiliar2%>%html_nodes(xpath="/html/body/main/div[1]/section/section/section[2]/div/div[3]/article/iframe")#nothing
c3 <- auxiliar2%>%html_nodes(xpath="/html/body/div/div/div/div[4]/div/div/div/div/div[1]/div")#nothing
The tricky bit is that the map is contained in an iframe and it's hard to access anything inside the iframe. It does look like you can find the iframe and its attributes though! The link contained in the iframe's src=
attribute contains the coordinates, so you can extract the iframe link and then the coordinates from it.
After this step in your original code:
politicas$clickElement()
I did this:
library(stringr)
library(rvest)
# pull the webpage html
html <- driver$getPageSource()[[1]]
# look for the iframe's node
# then pull the source attribute
map_link <- html %>%
read_html() %>%
html_node(".map-embed__iframe") %>%
html_attr("src")
here's what the link looks like:
map_link
[1] "https://www.google.com/maps/embed/v1/place?key=AIzaSyB1BH90qSMLRWrSEKe8D7fml7-kWHN2qjY&q=-20.460039,-54.634191"
Then you can use regular expressions or whatever to extract the coordinates
#remove everything before q=
map_link %>% str_remove(".*q=")
[1] "-20.460039,-54.634191"
Here's what I saw when I put those coordinates in google, so it looks like it's the same as the original map: