finding strings that appear in one column but missing from another based on a list of strings

I have a df, data:

data = data.frame("text" = c("John met Jay who met Jack who met Josh who met Jamie", "John and Jay and Jack and Josh and Jamie"), 
"" = c("Jay; Jack; Josh; Jamie", "John; Jack; Josh; Jamie"), 
"missing.names" = c("",""))

> data
                                    text               missing.names
1 John met Jay who met Jack who met Josh who met Jamie  Jay; Jack; Josh; Jamie              
2             John and Jay and Jack and Josh and Jamie John; Jack; Josh; Jamie     


and a second df of names:

names = data.frame("names" = c("John", "Jay", "Jack", "Josh", "Jamie"))
> names
1  John
2   Jay
3  Jack
4  Josh
5 Jamie

I am trying to find out whether data$ contains all the names contained in data$text. The universe of names is in names$names. Ideally, for each row of data$missing, I'd like to know which names$names is missing from data$

                                                  text  missing.names
1 John met Jay who met Jack who met Josh who met Jamie  Jay; Jack; Josh; Jamie          John
2             John and Jay and Jack and Josh and Jamie John; Jack; Josh; Jamie           Jay

Or any other configuration that would easily tell me what names are in the text but missing from

So essentially I am looking to find what names$names are included in data$text but not data$, and then list those names in data$missing.names.


  • A tidyverse solution:

    data %>%
      mutate(missing.names = map2_chr(text, str_split(, '; '),
                                      ~ str_c(str_extract_all(.x, regex(str_c(setdiff(names$names, .y), collapse = '|')))[[1]], collapse = '; ')))
    # # A tibble: 2 × 3
    #   text                                                  missing.names
    #   <chr>                                                <chr>                   <chr>        
    # 1 John met Jay who met Jack who met Josh who met Jamie Jay; Jack; Josh; Jamie  John         
    # 2 John and Jay and Jack and Josh and Jamie             John; Jack; Josh; Jamie Jay