Remove custom stopwords and phrases using quanteda

I have my stopword list which I would like to use it to remove specific phrases from text:

   #dummy text
    df2 <- c("hi my name is Ann and code code all the time! However not after that I would like")

mystopwords <- c("hi", "code code", "not after that")

I use this option:

myDfm <- df2 %>%
  tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>%
  tokens_remove(pattern = c(stopwords(source = "smart"), mystopwords)) %>%
  tokens_wordstem() %>%
  tokens_ngrams(n = c(1, 3)) %>%
  dfm()

but when I check the frequency of bigram or trigram they didn't removed just stemmed.

Is there anything wrong in the syntax?

Solution

You could achieve that by using phrase() function when you are using the list of stop-phrases.

It works like this:

library(quanteda)
df2 <- c("hi my name is Ann and code code all the time! However not after that I would like")

mystopwords <- c("hi", "code code", "not after that")

df2 %>% tokens %>% 
  tokens_remove(pattern = phrase(mystopwords), valuetype = 'fixed')

## tokens from 1 document.
## text1 :
##  [1] "my"      "name"    "is"      "Ann"     "and"     "all"     "the"     "time"    "!"       "However" "I"       "would"  
## [13] "like"

You can get the detailed information about how to work with multiword expressions in quanteda here: https://quanteda.io/articles/pkgdown/examples/phrase.html