Search code examples
dplyrgroup-bysummarize

Is there an automated pipeline for summing columns?


I need to sum data in many rows to create a single row, across many columns. I processed a dataset with 52 samples (columns) and entered them all in by hand, but am soon to be dealing with a MUCH larger dataset wherein manual entering will not be reasonable. here is a small example of what I did.

group_by(MTTAXA$MTmatch) %>%
summarise(comb_S026401.R1 = sum(S026401.R1), comb_S026404.R1 = sum(S026404.R1), 
comb_S026406.R1 = sum(S026406.R1),comb_S026409.R1 = sum(S026409.R1),
comb_S026412.R1 = sum(S026412.R1), comb_S026413.R1 = sum(S026413.R1), 

I'm sure there's a simple and elegant solution.


Solution

  • Here an example that might help you

    library(dplyr)
    
    df <-
      data.frame(
        var_grp_1 = sample(LETTERS[1:3],100,replace = TRUE),
        var_num_1 = rnorm(100),
        var_num_2 = rnorm(100),
        var_num_3 = rnorm(100),
        var_num_4 = rnorm(100),
        var_num_5 = rnorm(100),
        var_num_5 = rnorm(100)
      )
    
    df %>% 
      group_by(var_grp_1) %>% 
      summarise(across(.cols = starts_with("var_num"),.fns = sum,.names = "comb_{.col}"))
    
    # A tibble: 3 x 7
      var_grp_1 comb_var_num_1 comb_var_num_2 comb_var_num_3 comb_var_num_4 comb_var_num_5 comb_var_num_5.1
      <chr>              <dbl>          <dbl>          <dbl>          <dbl>          <dbl>            <dbl>
    1 A                  -3.96         -0.345           5.13           2.43         -2.78             0.120
    2 B                 -12.6          -3.99           -2.26          -1.03          0.313           -1.24 
    3 C                  -2.62          3.92           -1.44           1.27         -0.342            2.59