Search code examples
rtexttf-idftidytext

Error in R term frequency analysis (TF-IDF)


I tried to run the following code with the following data:

library(dplyr)
library(janeaustenr)
library(tidytext)

book_words <- austen_books() %>%
 unnest_tokens(word, text) %>%
 count(book, word, sort = TRUE)

For this, I get this error message:

Error in count(., book, word, sort = TRUE) : 
  unused argument (sort = TRUE)

What do I have to change for the code to work?


Solution

  • It is possible that count from dplyr got masked from any other package loaded with having the same function count. So, use dplyr::count

    austen_books() %>%
      unnest_tokens(word, text) %>% 
      dplyr::count(book, word, sort = TRUE)
    # A tibble: 40,379 × 3
       book              word      n
       <fct>             <chr> <int>
     1 Mansfield Park    the    6206
     2 Mansfield Park    to     5475
     3 Mansfield Park    and    5438
     4 Emma              to     5239
     5 Emma              the    5201
     6 Emma              and    4896
     7 Mansfield Park    of     4778
     8 Pride & Prejudice the    4331
     9 Emma              of     4291
    10 Pride & Prejudice to     4162
    # … with 40,369 more rows
    

    i.e. if we have loaded plyr after dplyr, it may mask some of the common functions available in dplyr

    > austen_books() %>%
    +   unnest_tokens(word, text) %>% 
    +   plyr::count(book, word, sort = TRUE)
    Error in plyr::count(., book, word, sort = TRUE) : 
      unused argument (sort = TRUE)