r dataframe aggregate frequency text-mining

Frequency of strings and their IDs in a dataframe using R

The goal is to generate the frequency of a text variable and associate the corresponding IDs with it.

Suppose Sample is a dataframe as shown below:

Sample <- data.frame(ID = c('1', '2', '3', '4', '5', '6'), 
                        Var = c('How are you', 
                                 'Do not go', 
                                 'How are you', 
                                 'Please go',  
                                 'How are you',
                                 'Do not go'))

The following command generates the frequency of the strings in the column Var as follows:

as.data.frame(table(unlist(strsplit(tolower(Sample$Var), ', '))))

Is there a way to generate the associated IDs together in the table, say as?:

Solution

Base R solution:

data.frame(do.call(rbind, lapply(with(Sample, split(Sample, Var)), function(x){
      with(x, data.frame(Var = unique(Var), Freq = nrow(x), ID = toString(ID)))
   }
  )
), row.names = NULL, stringsAsFactors = FALSE)