Search code examples
rdplyracross

How to count rows by group with n() inside dplyr::across()?


In previous versions of dplyr, if I wanted to get row counts in addition to other summary values using summarise(), I could do something like

library(tidyverse)

df <- tibble(
    group = c("A", "A", "B", "B", "C"),
    value = c(1, 2, 3, 4, 5)
)

df %>%
    group_by(group) %>% 
    summarise(total = sum(value), count = n())

`summarise()` ungrouping output (override with `.groups` argument)

# A tibble: 3 x 3
  group total count
  <chr> <dbl> <int>
1 A         3     2
2 B         7     2
3 C         5     1

My instinct to get the same output using the new across() function would be

df %>%
  group_by(group) %>% 
  summarise(across(value, list(sum = sum, count = n)))
Error: Problem with `summarise()` input `..1`.
x unused argument (col)
ℹ Input `..1` is `across(value, list(sum = sum, count = n))`.
ℹ The error occurred in group 1: group = "A".

The issue is specific to the n() function, just calling sum() works as expected:

df %>%
  group_by(group) %>% 
  summarise(across(value, list(sum = sum)))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
  group value_sum
  <chr>     <dbl>
1 A             3
2 B             7
3 C             5

I've tried out various syntactic variations (using lambdas, experimenting with cur_group(), etc.), to no avail. How would I get the desired result within across()?


Solution

  • We can use the lamdba function for n() while the sum can be invoked just by calling it if there are no other arguments to be specified

    library(dplyr)
    df %>%
      group_by(group) %>% 
      summarise(across(value, list(sum = sum, count = ~ n())), .groups = 'drop')
    

    -output

    # A tibble: 3 x 3
    #  group value_sum value_count
    #  <chr>     <dbl>       <int>
    #1 A             3           2
    #2 B             7           2
    #3 C             5           1