Using tidytext, I have this code:
data(stop_words)
tidy_documents <- tidy_documents %>%
anti_join(stop_words)
I want it to use the stop words built into the package to write a dataframe called tidy_documents into a dataframe of the same name, but with the words removed if they are in stop_words.
I get this error:
Error: No common variables. Please specify by
param.
Traceback:
1. tidy_documents %>% anti_join(stop_words)
2. withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
3. eval(quote(`_fseq`(`_lhs`)), env, env)
4. eval(expr, envir, enclos)
5. `_fseq`(`_lhs`)
6. freduce(value, `_function_list`)
7. withVisible(function_list[[k]](value))
8. function_list[[k]](value)
9. anti_join(., stop_words)
10. anti_join.tbl_df(., stop_words)
11. common_by(by, x, y)
12. stop("No common variables. Please specify `by` param.", call. = FALSE)
Both tidy_document
and stop_words
have a list of words listed under a column named word
; however, the columns are inverted: in stop_words
, it's the first column, while in your dataset it's the second column. That's why the command is unable to "match" the two columns and compare the words. Try this:
tidy_document <- tidy_document %>%
anti_join(stop_words, by = c("word" = "word"))
The by
command forces the script to compare the columns that are called word
, regardless their position.