Search code examples
rstringtm

Error in text mining: "replacement has length zero" & "number of items to replace is not a multiple of replacement length"


I was trying to extract multi-words from text using for loop.

The following line of code gives me an error which says replacement has length zero and number of items to replace is not a multiple of replacement length. To make it clear my question, consider the following situation.

library(tm)
library(stringr)
library(stringi)
mydata<-data.frame(id=c(1,2,3), 
          text=c("This is text mining exercise","Text analysis is bit confusing","Hint on this text 
          analysis?")) 
multiwords<-c("text","analysis","bit confusing")
txt<- freq<- list() 
for(i in 1:length(mydata$id)){ 
    txt[i]<-str_extract_all(mydata[i,], paste0(multiwords,collapse = "|")) freq[i]<-table(txt[i])
}

Note that every terms in multiwords does not necessarily appear at each iteration.


Solution

  • If we need the table on the entire extracted elements, use str_extract_all on the 'text' column after pasteing the 'multiwords' as pattern, then unlist the list and get the table

    library(stringr)
    lst1 <- str_extract_all(mydata$text, str_c(multiwords, collapse="|"))
    table(unlist(lst1))
    #    analysis bit confusing          text 
    #           2             1             2 
    

    If we need to apply table on each element of list

    lapply(lst1, table)
    #[[1]]
    
    #text 
    #   1 
    
    #[[2]]
    
    #     analysis bit confusing 
    #            1             1 
    
    #[[3]]
    
    #analysis     text 
    #       1        1