Removing stop words with tidytext

Using tidytext, I have this code:

data(stop_words)
tidy_documents <- tidy_documents %>%
      anti_join(stop_words)

I want it to use the stop words built into the package to write a dataframe called tidy_documents into a dataframe of the same name, but with the words removed if they are in stop_words.

I get this error:

Error: No common variables. Please specify by param. Traceback:

1. tidy_documents %>% anti_join(stop_words)
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(expr, envir, enclos)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. anti_join(., stop_words)
10. anti_join.tbl_df(., stop_words)
11. common_by(by, x, y)
12. stop("No common variables. Please specify `by` param.", call. = FALSE)

Solution

Both tidy_document and stop_words have a list of words listed under a column named word; however, the columns are inverted: in stop_words, it's the first column, while in your dataset it's the second column. That's why the command is unable to "match" the two columns and compare the words. Try this:

tidy_document <- tidy_document %>% 
      anti_join(stop_words, by = c("word" = "word"))

The by command forces the script to compare the columns that are called word, regardless their position.