Search code examples
htmlrweb-scrapingrvest

using rvest to extract lat lon


I am trying to get the lat lon of all stores from here https://www.wellcome.com.hk/en/our-store

On inspecting, I can see that lat and lon are contained within div enter image description here

library(dplyr)
library(rvest)

url_company <- rvest::read_html("https://www.wellcome.com.hk/en/our-store") 
url_company %>%
 html_elements("div") %>% # extracted all the div tag
 html_elements("p") # extracted all p tag

How do I reach to the data-lat and data-lng tag?


Solution

  • A different way, not using V8:

    
    url %>%
      read_html %>%
      html_nodes(css = ".content > div:nth-child(1) > script:nth-child(3)") %>% 
      html_text %>%
      str_split("googleMapData = |;\n\n\nvar googleMapLocation =") %>%
      {.[[1]][2]} %>%
      fromJSON 
    

    Output:

       name    addr  name_zh addr_zh tel   time  time_zh region district   lat   lng
       <chr>   <chr> <chr>   <chr>   <chr> <chr> <chr>   <chr>  <chr>    <dbl> <dbl>
     1 Ching … Shop… 菁田    屯門菁… 2317… 08:0… 08:00-… 32     24        22.4  114.
     2 Lei Ki… Shop… 鯉景灣  香港西… 2815… 07:3… 07:30-… 30     161       22.3  114.
     3 Garden… Shop… 花園大… 官塘牛… 2372… 08:0… 08:00 … 31     96        22.3  114.
     4 Tsuen … 57-6… 荃灣    荃灣路… 2411… 07:0… 07:00-… 32     23        22.4  114.
     5 Pak Ti… Shop… 白田邨  九龍白… 2335… 08:0… 08:00-… 31     166       22.3  114.
     6 Tak Bo… Shop… 得寶花… 九龍牛… 2382… 08:0… 08:00-… 31     35        22.3  114.
     7 Dor He… Shop… 多喜大… 九龍牛… 2628… 09:0… 09:00-… 31     216       22.3  114.
     8 Cheval… Shop… 其士大… 九龍尖… 2713… 08:0… 08:00-… 31     15        22.3  114.
     9 Shan K… Stal… 山景2   屯門鳴… 2653… 07:3… 07:30-… 32     24        22.4  114.
    10 Shek M… Shop… 石門    沙田安… 2854… 08:0… 08:00-… 32     21        22.4  114.
    # ℹ 269 more rows
    

    I assume you want the name, the lat, and the lng for each one, but maybe you just want the lat and lng columns, or maybe the name_zh too, or something else, so in the absence of more definitive guidance I'll leave it there.