I am trying to create bigrams with both words stemmed. But my code is only stemming the second word, leaving the first word unstemmed. So, for example, "worrying about" and "worry about" are listed separately.
Any assistance would be appreciated.
bigram_text <- text_df %>%
mutate_all(as.character) %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2)%>%
mutate(bigram = wordStem(bigram))
bigramcount<- bigram_text %>%
count(bigram, sort = TRUE)
The problem you face is that wordStem
and a lot of other stemmers only stem words. You want to stem a bigram wich is 2 words. What you need is to use a specific function that can stem sentences. In this case you can use a function from the package textstem called stem_strings
.
library(textstem)
bigram_text <- text_df %>%
mutate_all(as.character) %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2)%>%
mutate(bigram = stem_strings(bigram))
Of course a more roundabout way would be to split the bigram into 2 columns, stem the columns and then paste them back together.