I am trying to write a function which returns the stem map of words when a text is made to undergo porter stemming. When I tried to run an example, the code wouldn't stop running, i.e no output came. There was no error, but when I force stopped it, it gave warnings like:
1: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
2: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
3: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
4: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
5: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
6: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
7: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
8: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
9: In stemList[length(stemList) + 1][2] <- flatText[i] :
number of items to replace is not a multiple of replacement length
My code is as follows:
stemMAP<-function(text){
flatText<-unlist(strsplit(text," "))
textLength<-length(flatText)
stemList<-list(NULL)
for(i in 1:textLength){
wordStem<-SnowballStemmer(flatText[i])
flagStem=0
flagWord=0
for(j in 1:length(stemList)){
if(regexpr(wordStem,stemList[j][1])==TRUE){
for(k in 1:length(stemList[j])){
if(regexpr(flatText[i],stemList[j][k])==TRUE){
flagWord=1
#break;
}
}
if(flagWord==0){
stemList[j][length(stemList[j])+1]<-flatText[i]
#break;
}
flagStem=1
}
if(flagStem==0){
stemList[length(stemList)+1][1]<-wordStem
stemList[length(stemList)+1][2]<-flatText[i]
}
}
}
return(stemList)
}
How can I identify the mistakes? My test statement was:
stem<-stemMAP("I like being active and playing because when you play it activates your body and this activation leads to a good health")
Here I rewrite your code using the vectorize version of SnowballStemmer
. No need to use for.
library(plyr)
stemMAP<-function(text){
flatText <- unlist(strsplit(text," "))
## here I use the vectorize version
wordStem <- as.character(SnowballStemmer(flatText))
hh <- data.frame(ff = flatText,sn = wordStem)
## I use plyr to transform the result to a list
## dlply : data.frame to list apply
## we group the hh by the column sn , and a apply the
## function as.character(x$ff) to each group( x here is subset data.fame)
stemList <- dlply(hh,.(sn),function(x) as.character(x$ff))
stemList
}
stemList
$I
[1] "I"
$a
[1] "a"
$activ
[1] "active" "activates" "activation"
$and
[1] "and" "and"
$be
[1] "being"