Search code examples
rextracttidyr

R extract house / street numers from adress string


say I have the following data with addresses, i.e. street names. My goal is to separate street names from house numbers.

mydf <- tribble(
  ~street,
  "Some Way 10",
  "Shiny Street   12b",
  "Dark Street   from Netflix Movie 17c - 17d",
  "Seasame Street",
  "Dark Alley 15c",
)

mydf <- mydf %>% mutate(street= str_squish(street)) # get rid of whitespace

I tried the following

sub <- tidyr::extract(mydf, "street", c("street_name_only", "house_number"), "(\\D+)(\\d.*)") %>%
 print(n=5)

which works fine, as long as there is a street or house number present. If the string "street" is without a street number, then NAs will show up in the new variable "street_name_only" as well as "house_number", as is the case with "Sesame Street". ( I would like to have "Sesame Street" in the "new_street_column" and ideally "" (empty) in the house_number column, though I could mange the NA in the house_number column afterwards).

Could anybody tell me where I went wrong and how to solve this issue?

Thank you very much in advance.


Solution

  • Will this work:

    mydf %>% 
        transmute(street_name_only = str_remove(street, '\\d.*'), 
                      house_number = str_extract(street, '\\d.*'))
    
    # A tibble: 5 x 2
      street_name_only                  house_number
      <chr>                             <chr>       
    1 "Some Way "                       10          
    2 "Shiny Street "                   12b         
    3 "Dark Street from Netflix Movie " 17c - 17d   
    4 "Seasame Street"                  NA          
    5 "Dark Alley "                     15c