Search code examples
rdplyrsummarize

Dplyr Summarise Groups as Column Names


I got a data frame with a lot of columns and want to summarise them with multiple functions.

test_df <- data.frame(Group = sample(c("A", "B", "C"), 10, T), var1 = sample(1:5, 10, T), var2 = sample(3:7, 10, T))

test_df %>% 
  group_by(Group) %>% 
  summarise_all(c(Mean = mean, Sum = sum))

    # A tibble: 3 x 5
  Group var1_Mean var2_Mean var1_Sum var2_Sum
  <chr>     <dbl>     <dbl>    <int>    <int>
1 A          3.14      5.14       22       36
2 B          4.5       4.5         9        9
3 C          4         6           4        6

This results in a tibble with the first row Group and column names with a combination of the previous column name and the function name. The desired result is a table with the previous column names as first row and the groups and functions in the column names.

I can achive this with

test_longer <- test_df %>% pivot_longer(cols = starts_with("var"), names_to = "var", values_to = "val")

# Add row number because spread needs unique identifiers for rows
test_longer <- test_longer %>% 
  group_by(Group) %>% 
  mutate(grouped_id = row_number())

spread(test_longer, Group, val) %>% 
  select(-grouped_id) %>% 
  group_by(var) %>% 
  summarise_all(c(Mean = mean, Sum = sum), na.rm = T)

    # A tibble: 2 x 7
  var   A_Mean B_Mean C_Mean A_Sum B_Sum C_Sum
  <chr>  <dbl>  <dbl>  <dbl> <int> <int> <int>
1 var1    3.14    4.5      4    22     9     4
2 var2    5.14    4.5      6    36     9     6

But this seems to be a rather long detour... There probably is a better way, but I could not find it. Any suggestions? Thank you


Solution

  • There's lots of ways to go about it, but I would simplify it by pivoting to a longer data frame initially, and then grouping by var and group. Then you can just pivot wider to get the final result you want. Note that I used summarize(across()) which replaces the deprecated summarize_all(), even though with a single column could've just manually specified Mean = ... and Sum = ....

    set.seed(123)
    
    test_df %>%
      pivot_longer(
        var1:var2,
        names_to = "var"
      ) %>%
      group_by(Group, var) %>%
      summarize(
        across(
          everything(),
          list(Mean = mean, Sum = sum),
          .names = "{.fn}"
        ),
        .groups = "drop"
      ) %>%
      pivot_wider(
        names_from = "Group",
        values_from = c(Mean, Sum),
        names_glue = "{Group}_{.value}"
      )
    #> # A tibble: 2 × 7
    #>   var   A_Mean B_Mean C_Mean A_Sum B_Sum C_Sum
    #>   <chr>  <dbl>  <dbl>  <dbl> <int> <int> <int>
    #> 1 var1       1    2.5    3.2     1    10    16
    #> 2 var2       5    4.5    4.4     5    18    22