Search code examples
rstringloopsgrepledge-list

Extract names from a string using a list of names with grepl and a loop and add them to a new column in R


I have a dataset with a column containing names and a column indicating what the person did during the day. I am trying to figure out who met with whom in my dataset during that day using R. I created a vector containing the names in the dataset and used grepl in a loop to identify where the names appear in the column detailing the activity of the people in the dataset.

name <- c("Dupont","Dupuy","Smith") 

activity <- c("On that day, he had lunch with Dupuy in London.", 
              "She had lunch with Dupont and then went to Brighton to meet Smith.", 
              "Smith remembers that he was tired on that day.")

met_with <- c("Dupont","Dupuy","Smith")

df<-data.frame(name, activity, met_with=NA)


for (i in 1:length(met_with)) {
df$met_with<-ifelse(grepl(met_with[i], df$activity), met_with[i], df$met_with)
}

However, this solution is not satisfying for two reasons. I can't extract more than one name when the person met with more than one other person (ex Dupuy in my example) and I cannot tell R not to return the name of the person when the name is used instead of a pronoun in my activity column (ex. Smith).

Ideally, I would like the df to look like:

  name         activity                                            met_with                             
  Dupont       On that day, he had lunch with Dupuy in London.     Dupuy
  Dupuy        She had lunch with Dupont and then (...).           Dupont Smith
  Smith        Smith remembers that he was tired on that day.      NA

I am cleaning up the strings to construct an edge list and node list to conduct network analysis later on.

Thank you


Solution

  • Same logic as @Gki but using stringr functions and mapply instead of loop.

    library(stringr)
    
    pat <- str_c('\\b', df$name, '\\b', collapse = '|')
    df$met_with <- mapply(function(x, y) str_c(setdiff(x, y), collapse = ' '), 
           str_extract_all(df$activity, pat), df$name)
    
    df
    
    #    name                                                           activity
    #1 Dupont                    On that day, he had lunch with Dupuy in London.
    #2  Dupuy She had lunch with Dupont and then went to Brighton to meet Smith.
    #3  Smith                     Smith remembers that he was tired on that day.
    
    #      met_with
    #1        Dupuy
    #2 Dupont Smith
    #3