Search code examples
rtmstop-words

Removing all stopwords except "you", "your's", "me", "mine"


I am trying to remove all english stopwords, except "you/your's", "me/mine" because those are important to concider for my analysis. Can someone please help me with this issue? I am very new to R, so I know that I remove stopwords with the following code:

corpus <- tm_map(corpus, removeWords, stopwords("english"))

... but I have no clue about how to keep the words I need


Solution

  • You can exract the strings from stopwords("english") and remove the strings you wish to keep so that they won't be excluded. Here is an example with the dplyr grammar.

    library(tm)
    library(dplyr)
    library(stringr)
    
    words_to_keep <- c("me","mine","your","yours")
    
    my_stopwords <- data.frame(words = stopwords("english"))%>% #make into dataframe
      filter(!(words %in% words_to_keep))%>% #filter to exclude the words you want to keep
      pull() #transform it back into a vector of strings 
    
    corpus <- tm_map(corpus,removeWords,my_stopwords)