I was trying to extract multi-words from text using for loop.
The following line of code gives me an error which says replacement has length zero
and number of items to replace is not a multiple of replacement length
. To make it clear my question, consider the following situation.
library(tm)
library(stringr)
library(stringi)
mydata<-data.frame(id=c(1,2,3),
text=c("This is text mining exercise","Text analysis is bit confusing","Hint on this text
analysis?"))
multiwords<-c("text","analysis","bit confusing")
txt<- freq<- list()
for(i in 1:length(mydata$id)){
txt[i]<-str_extract_all(mydata[i,], paste0(multiwords,collapse = "|")) freq[i]<-table(txt[i])
}
Note that every terms in multiwords
does not necessarily appear at each iteration.
If we need the table
on the entire extracted elements, use str_extract_all
on the 'text' column after pasteing the 'multiwords' as pattern
, then unlist
the list
and get the table
library(stringr)
lst1 <- str_extract_all(mydata$text, str_c(multiwords, collapse="|"))
table(unlist(lst1))
# analysis bit confusing text
# 2 1 2
If we need to apply table
on each element of list
lapply(lst1, table)
#[[1]]
#text
# 1
#[[2]]
# analysis bit confusing
# 1 1
#[[3]]
#analysis text
# 1 1