Search code examples
rdebuggingporter-stemmer

Debugging in R- How do I locate the error?


I am trying to write a function which returns the stem map of words when a text is made to undergo porter stemming. When I tried to run an example, the code wouldn't stop running, i.e no output came. There was no error, but when I force stopped it, it gave warnings like:

1: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
2: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
3: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
4: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
5: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
6: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
7: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
8: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length
9: In stemList[length(stemList) + 1][2] <- flatText[i] :
  number of items to replace is not a multiple of replacement length

My code is as follows:

stemMAP<-function(text){
  flatText<-unlist(strsplit(text," "))
  textLength<-length(flatText)

  stemList<-list(NULL)
  for(i in 1:textLength){
    wordStem<-SnowballStemmer(flatText[i])
    flagStem=0
    flagWord=0

    for(j in 1:length(stemList)){
      if(regexpr(wordStem,stemList[j][1])==TRUE){

        for(k in 1:length(stemList[j])){
          if(regexpr(flatText[i],stemList[j][k])==TRUE){ 
            flagWord=1
            #break;
            }
         }

        if(flagWord==0){
          stemList[j][length(stemList[j])+1]<-flatText[i]
          #break;
        }

        flagStem=1

      }

      if(flagStem==0){
        stemList[length(stemList)+1][1]<-wordStem
        stemList[length(stemList)+1][2]<-flatText[i]
      }

    }

  }

  return(stemList)
}

How can I identify the mistakes? My test statement was:

stem<-stemMAP("I like being active and playing because when you play it activates your body and this activation leads to a good health")

Solution

  • Here I rewrite your code using the vectorize version of SnowballStemmer. No need to use for.

    library(plyr)   
    stemMAP<-function(text){
      flatText <- unlist(strsplit(text," "))
      ## here I use the vectorize version
      wordStem <- as.character(SnowballStemmer(flatText))
      hh <- data.frame(ff = flatText,sn = wordStem)
      ## I use plyr to transform the result to a list
      ## dlply : data.frame to list apply
      ## we group the hh by the column sn , and a apply the 
      ## function as.character(x$ff) to each group( x here is subset data.fame)
      stemList <- dlply(hh,.(sn),function(x) as.character(x$ff))
      stemList
    }
    
    stemList
    $I
    [1] "I"
    
    $a
    [1] "a"
    
    $activ
    [1] "active"     "activates"  "activation"
    
    $and
    [1] "and" "and"
    
    $be
    [1] "being"