I am trying to scrape name/address information from yellowpages (https://www.yellowpages.ca/). I have a function (from :(R) Webscraping Error : arguments imply differing number of rows: 1, 0) that is able to retrieve this information:
library(rvest)
library(dplyr)
scraper <- function(url) {
page <- url %>%
read_html()
tibble(
name = page %>%
html_elements(".jsListingName") %>%
html_text2(),
address = page %>%
html_elements(".listing__address--full") %>%
html_text2()
)
}
However, sometimes the address information is not always present. For example : there are several barbers listed on this page https://www.yellowpages.ca/search/si/1/barber/Sudbury+ON and they all have addresses except one of them. As a result, when I run this function, I get the following error:
scraper("https://www.yellowpages.ca/search/si/1/barber/Sudbury+ON")
Error:
! Tibble columns must have compatible sizes.
* Size 14: Existing data.
* Size 12: Column `address`.
i Only values of size one are recycled.
Run `rlang::last_error()` to see where the error occurred.
My Question: Is there some way that I can modify the definition of the "scraper" function in such a way, such that when no address is listed, an NA appears in that line? For example:
barber address
1 barber111 address111
2 barber222 address222
3 barber333 NA
Is there some way I could add a statement similar to CASE WHEN
that would grab the address or place an NA when the address is not there?
In order to match the businesses with their addresses, it is best to find a root node for each listing and get the text from the relevant child node. If the child node is empty, you can add an NA
library(rvest)
library(dplyr)
scraper <- function(url) {
nodes <- read_html(url) %>% html_elements(".listing_right_section")
tibble(name = nodes %>% sapply(function(x) {
x <- html_text2(html_elements(x, css = ".jsListingName"))
if(length(x)) x else NA}),
address = nodes %>% sapply(function(x) {
x <- html_text2(html_elements(x, css = ".listing__address--full"))
if(length(x)) x else NA}))
}
So now we can do:
scraper("https://www.yellowpages.ca/search/si/1/barber/Sudbury+ON")
#> # A tibble: 14 x 2
#> name address
#> <chr> <chr>
#> 1 Lords'n Ladies Hair Design 1560 Lasalle Blvd, Sudbury, ON P3A~
#> 2 Jo's The Lively Barber 611 Main St, Lively, ON P3Y 1M9
#> 3 Hairapy Studio 517 & Barber Shop 517 Notre Dame Ave, Sudbury, ON P3~
#> 4 Nickel Range Unisex Hairstyling 111 Larch St, Sudbury, ON P3E 4T5
#> 5 Ugo Barber & Hairstyling 911 Lorne St, Sudbury, ON P3C 4R7
#> 6 Gordon's Hairstyling 19 Durham St, Sudbury, ON P3C 5E2
#> 7 Valley Plaza Barber Shop 5085 Highway 69 N, Hanmer, ON P3P ~
#> 8 Rick's Hairstyling Shop 28 Young St, Capreol, ON P0M 1H0
#> 9 President Men's Hairstyling & Barber Shop 117 Elm St, Sudbury, ON P3C 1T3
#> 10 Pat's Hairstylists 33 Godfrey Dr, Copper Cliff, ON P0~
#> 11 WildRootz Hair Studio 911 Lorne St, Sudbury, ON P3C 4R7
#> 12 Sleek Barber Bar 324 Elm St, ON P3C 1V8
#> 13 Faiella Classic Hair <NA>
#> 14 Ben's Barbershop & Hairstyling <NA>
Created on 2022-09-16 with reprex v2.0.2