Search code examples
rstringlistpartial

Partial String Match based on a list


I want to partial string match through an entire list. Then to create a data frame with both showing the Proper name next to the name from the abbreviated name.

I'm sure this is easy but I haven't been able to find it yet.

For example:


library(data.table)


list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut")

list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger")

# I've tried

Pattern = paste(list_proper, collapse="|")

DT_result = data.table(list_abbreviated, result=grepl(Pattern, list_abbreviated ))
DT_result

# This is the result

   list_abbreviated result
1:       KF Chicken  FALSE
2:       CHI Wendys  FALSE
3:     CAL InandOut  FALSE

# I tried other options using %like% to no avail either. 

# This is the output I  am looking for

  list_abbreviated result            list_proper
1       KF Chicken   TRUE Kentucky Fried Chicken
2       CHI Wendys   TRUE         Chicago Wendys
3     CAL InandOut   TRUE    California InandOut


Solution

  • One option would be to create a subset of the last name to do a partial join on. So, we can use regex_inner_join from fuzzyjoin to do a partial join to merge the two data tables together.

    library(stringi)
    library(fuzzyjoin)
    library(data.table)
    
    list_abbreviated = data.table(list_abbreviated = c("KF Chicken", "CHI Wendys", "CAL InandOut"))
    list_abbreviated[, limited:= stri_extract_last_words(list_abbreviated)]
    
    list_proper = data.table(list_proper = c("Kentucky Fried Chicken", "Chicago Wendys", "California InandOut", "Ontario Whataburger"))
    
    DT_result <- data.table(regex_inner_join(list_proper, list_abbreviated, by = c("list_proper" = "limited")))
    DT_result[,limited:=NULL]
    

    Output

                  list_proper list_abbreviated
    1: Kentucky Fried Chicken       KF Chicken
    2:         Chicago Wendys       CHI Wendys
    3:    California InandOut     CAL InandOut