Search code examples
r

How can I calculate sums of multiple columns without them being summed together into 1 column?


relatively new to R and am trying to solve an issue with my syntax.

I have a dataframe that looks something like this:

dat = tibble::tribble(
  ~Sample,  ~Motility,    ~Diet, ~Life.Position, ~n, ~t1, ~t2,~t3,
   '0m',     "Slow moving" ,   "Carnivore", "Infaunal", 1,3,1,5,
   '0m',     "Fac. mobile" ,   "Filter feeder", "Epifaunal", 2,1,1,3,
   '0m',     "Stationary attached", "Suspension feeder","Epifaunal", 1,3,4,2,
   '0m',     "Fac. mobile" ,  "Filter feeder", "Deep infaunal", 2,4,1,3,
   '0.5m',     "Slow moving" ,  "Carnivore",  "Infaunal", 1,3,1,4,
   '0.5m',     "Fac. mobile" , "Filter feeder", "Epifaunal", 2,0,1,1,
   '0.5m',     "Stationary attached",  "Suspension feeder", "Epifaunal",1,3,0,3,
   '0.5m',     "Fac. mobile",  "Filter feeder", "Deep infaunal", 2,4,2,3
   )

In reality, I have 18 different samples and I have 100 different t columns that represent subsampled abundance data for species appearing in my test samples.

I have tried a couple of things, though due to my inexperience likely not the correct way or the most efficient way.

I am trying to take all 100 subsamples and sum the same traits (motility, diet, life.position) within each column so I get this result:

Sample Motility n t1 t2 t3 ...
0 m Slow moving 1 3 1 5 ...
0 m Fac. mobile 4 5 2 6 ...
0 m Stationary, attached 1 3 4 2 ...
0.5 m Slow moving 1 3 1 3 ...
0.5 m Fac. mobile 4 4 3 4 ...
0.5 m Stationary, attached 1 3 0 3 ...

For example:

df %>%
   group_by(Sample, Motility) %>%
     summarise(trial = sum(across(t1:t100)))

This gave me 1 column with sums of each trait (nice) but summing all 100 trials together (not nice). I understand that a for loop might be able to help me out however my understanding of them and their functionality is extremely poor.

Sample Motility trial
0 m Slow moving 60
0 m Fac. mobile 35
0.5 m Slow moving 42
0.5 m Fac. mobile 26
0.5 m Stationary, attached 48

Any more information that you may need I can absolutely provide.

Thank you


Solution

  • The across helper function can be used in this way:

    df |>
     group_by(Sample, Motility) |>
     summarise(across(t1:t3, sum))
    
    # A tibble: 6 × 5
    # Groups:   Sample [2]
      Sample Motility                t1    t2    t3
      <chr>  <chr>                <int> <int> <int>
    1 0 m    Fac. mobile              5     2     6
    2 0 m    Slow moving              3     1     5
    3 0 m    Stationary, attached     3     4     2
    4 0.5 m  Fac. mobile              4     3     4
    5 0.5 m  Slow moving              3     1     4
    6 0.5 m  Stationary, attached     3     0     3
    

    Another common way to tackle a problem like this, is to first use pivot_longer to change the data into "long" format, then summarise after:

    df |> 
      pivot_longer(t1:t3, names_to = 't', values_to = 'value') |> 
      group_by(Sample, Motility, t) |> 
      summarise(sum = sum(value))
    
    # A tibble: 18 × 4
    # Groups:   Sample, Motility [6]
       Sample Motility             t       sum
       <chr>  <chr>                <chr> <int>
     1 0 m    Fac. mobile          t1        5
     2 0 m    Fac. mobile          t2        2
     3 0 m    Fac. mobile          t3        6
     4 0 m    Slow moving          t1        3
     5 0 m    Slow moving          t2        1
     6 0 m    Slow moving          t3        5
     7 0 m    Stationary, attached t1        3
     8 0 m    Stationary, attached t2        4
     9 0 m    Stationary, attached t3        2
    10 0.5 m  Fac. mobile          t1        4
    11 0.5 m  Fac. mobile          t2        3
    12 0.5 m  Fac. mobile          t3        4
    13 0.5 m  Slow moving          t1        3
    14 0.5 m  Slow moving          t2        1
    15 0.5 m  Slow moving          t3        4
    16 0.5 m  Stationary, attached t1        3
    17 0.5 m  Stationary, attached t2        0
    18 0.5 m  Stationary, attached t3        3