Search code examples
htmlrweb-scrapingrvest

return same number of elements from html using rvest


I am trying to scrape the name of city and address of all Apple stores in the UK using rvest

library(rvest)
library(xml2)
library(tidyverse)

my_url <- read_html("https://www.apple.com/uk/retail/storelist/")

# extract city name 
city_name <- my_url %>% html_elements("h2") %>% html_text2()
length(city_name)
# 27 cities

address <- my_url %>% html_elements("address") %>% html_text2()
length(address)
# 38 addresses

I am getting more addresses than city names. This is because some cities have multiple stores. How do I get same number city name and address so that I can put them in the dataframe?


Solution

  • You can do

    library(rvest)
    library(xml2)
    library(tidyverse)
    
    read_html("https://www.apple.com/uk/retail/storelist/") %>% 
      html_elements(xpath = "//div[@class='state']") %>%
      lapply(function(x) {
        data.frame(city = html_element(x, "h2") %>% html_text(), 
                   address = html_elements(x, "address") %>% html_text2())}) %>%
      do.call(rbind, .) %>%
      as_tibble()
    #> # A tibble: 38 x 2
    #>    city            address                                                      
    #>    <chr>           <chr>                                                        
    #>  1 Aberdeen        "27/28 Ground Level Mall\nUnion Square\nAberdeen , AB11 ~
    #>  2 Antrim          "Upper Ground Floor\n1 Victoria Square\nBelfast , BT1 4Q~
    #>  3 Berkshire       "The Oracle Shopping Centre\nUpper Level\nReading , RG1 ~
    #>  4 Bristol         "11 Philadelphia Street\nQuakers Friars\nBristol , BS1 3~
    #>  5 Bristol         "Upper Mall\nThe Mall at Cribbs Causeway\nBristol , BS34~
    #>  6 Buckinghamshire "26 Midsummer Place\nMidsummer Boulevard\nMilton Keynes ~
    #>  7 Cambridgeshire  "Grand Arcade Shopping Centre\nCambridge , CB2 3AX\n0122~
    #>  8 Cardiff         "63-66 Grand Arcade\nSt David’s Dewi Sant\nCardiff , CF1~
    #>  9 Central London  "No. 1-7 The Piazza\nLondon , WC2E 8HB\n020 7447 1400"    
    #> 10 Central London  "235 Regent Street\nLondon , W1B 2EL\n020 7153 9000"      
    #> # ... with 28 more rows
    

    Created on 2022-04-12 by the reprex package (v2.0.1)