say I have the following data with addresses, i.e. street names. My goal is to separate street names from house numbers.
mydf <- tribble(
~street,
"Some Way 10",
"Shiny Street 12b",
"Dark Street from Netflix Movie 17c - 17d",
"Seasame Street",
"Dark Alley 15c",
)
mydf <- mydf %>% mutate(street= str_squish(street)) # get rid of whitespace
I tried the following
sub <- tidyr::extract(mydf, "street", c("street_name_only", "house_number"), "(\\D+)(\\d.*)") %>%
print(n=5)
which works fine, as long as there is a street or house number present. If the string "street" is without a street number, then NAs will show up in the new variable "street_name_only" as well as "house_number", as is the case with "Sesame Street". ( I would like to have "Sesame Street" in the "new_street_column" and ideally "" (empty) in the house_number column, though I could mange the NA in the house_number column afterwards).
Could anybody tell me where I went wrong and how to solve this issue?
Thank you very much in advance.
Will this work:
mydf %>%
transmute(street_name_only = str_remove(street, '\\d.*'),
house_number = str_extract(street, '\\d.*'))
# A tibble: 5 x 2
street_name_only house_number
<chr> <chr>
1 "Some Way " 10
2 "Shiny Street " 12b
3 "Dark Street from Netflix Movie " 17c - 17d
4 "Seasame Street" NA
5 "Dark Alley " 15c