Search code examples
rdplyrtidyselect

Can I group_by columns with starts_with?


I'm dealing with a big dataframe that has a number of columns I want to group by. I'd like to do something like this:

output <- df %>% 
  group_by(starts_with("GEN", ignore.case=TRUE),x,y) %>% 
  summarize(total=n()) %>% 
  arrange(desc(total))

is there a way to do this? Maybe with group_by_at or some other similar function?


Solution

  • To use starts_with() in group_by(), you need to wrap it in across(). Here is an example using some built data.

    library(dplyr)
    mtcars %>%
    group_by(across(starts_with("c"))) %>%
    summarize(total = n()) %>%
    arrange(-total)
    
    # A tibble: 9 x 3
    # Groups:   cyl [3]
        cyl  carb total
      <dbl> <dbl> <int>
    1     4     2     6
    2     8     4     6
    3     4     1     5
    4     6     4     4
    5     8     2     4
    6     8     3     3
    7     6     1     2
    8     6     6     1
    9     8     8     1