Search code examples
rdplyrrlangsummarize

Error while using n() inside summarise_at()


While using n() within summarise_at(), I obtain this error:

Error: n() should only be called in a data context
Call `rlang::last_error()` to see a backtrace

Others have suggested this could be a masking issue of dplyr with plyr, two solutions are:

  1. Replace summarise_at() with `dplyr::summarise_at()'
  2. Call detach("package:plyr", unload=TRUE)

Neither have removed this error and I'm curious to understand what is causing it. Here is a reproducible example which should result in the same error:

Df <- data.frame(
  Condition = c(rep("No", 20), rep("Yes",20)),
  Height = c(rep(1,10),rep(2,10),rep(1,10),rep(2,10)),
  Weight = c(rep(10,5),rep(20,5),rep(30,5), rep(40,5))
)

x <- c("Height","Weight")

Df %>% 
  group_by(Condition) %>% 
  summarise_at(vars(one_of(x)), c(mean = mean, sd = sd, count = n()))

Note: If you remove count = n() the code runs without any issue


Solution

  • I believe it is because n() works on the data source itself within mutate, filter, or summarize, so isn't a vectorized function. Just use length instead as the vectorized version.

    Df %>% 
      group_by(Condition) %>% 
      summarise_at(vars(one_of(x)), c(mean = mean, sd = sd, count = length))
    

    If you want to only have one count column, then:

    Df %>% 
      group_by(Condition) %>%
      mutate(count = n()) %>%
      group_by(Condition, count) %>%
      summarise_at(vars(one_of(x)), c(mean = mean, sd = sd))