Search code examples
rtidyverse

Tidyverse: Match word in string from list of keywords


I'm trying to write some code that will check to see if a string contains any words contained in a list of terms, in order to create a new column in the dataframe.

This is the list of terms: vehicles <- c('vehicle', 'mazda', 'nissan', 'ford', 'honda', 'chevrolet', 'toyota')

Examples of the strings I'm searching include: "2001 honda civic", "2003 nissan altima", "2005 mazda 5", etc. (these are the asset_name in the code below).

my simplified code looks like this:

df %>%
  mutate(
    asset_type = case_when(
      vehicles %in% asset_name == TRUE ~ 'vehicle', # this doesn't work, obviously
      <CODE THAT DOES WORK HERE!!!>
      TRUE ~ asset_name
    )
  )

I've tried str_detect, str_extract, grepl & a custom function but can't seem to figure out how to make this work.

I know that for each asset_name entry, I need to loop through the list of vehicles to see if one of the vehicle models is in asset_name but I can't seem to make it work. grr...

Thanks in advance!!!


Solution

  • One approach might be to build a regex alternation of the vehicle terms, and then use grepl to match:

    vehicles <- c('vehicle', 'mazda', 'nissan', 'ford', 'honda', 'chevrolet', 'toyota')
    regex <- paste0("\\b(?:", paste(vehicles, collapse="|"), ")\\b")
    
    df %>%
        mutate(
            asset_type = case_when(
                grepl(regex, asset_name) ~ 'vehicle',
                <CODE THAT DOES WORK HERE!!!>
                TRUE ~ asset_name
            )
        )