relatively new to R and am trying to solve an issue with my syntax.
I have a dataframe that looks something like this:
dat = tibble::tribble(
~Sample, ~Motility, ~Diet, ~Life.Position, ~n, ~t1, ~t2,~t3,
'0m', "Slow moving" , "Carnivore", "Infaunal", 1,3,1,5,
'0m', "Fac. mobile" , "Filter feeder", "Epifaunal", 2,1,1,3,
'0m', "Stationary attached", "Suspension feeder","Epifaunal", 1,3,4,2,
'0m', "Fac. mobile" , "Filter feeder", "Deep infaunal", 2,4,1,3,
'0.5m', "Slow moving" , "Carnivore", "Infaunal", 1,3,1,4,
'0.5m', "Fac. mobile" , "Filter feeder", "Epifaunal", 2,0,1,1,
'0.5m', "Stationary attached", "Suspension feeder", "Epifaunal",1,3,0,3,
'0.5m', "Fac. mobile", "Filter feeder", "Deep infaunal", 2,4,2,3
)
In reality, I have 18 different samples and I have 100 different t columns that represent subsampled abundance data for species appearing in my test samples.
I have tried a couple of things, though due to my inexperience likely not the correct way or the most efficient way.
I am trying to take all 100 subsamples and sum the same traits (motility, diet, life.position) within each column so I get this result:
Sample | Motility | n | t1 | t2 | t3 | ... |
---|---|---|---|---|---|---|
0 m | Slow moving | 1 | 3 | 1 | 5 | ... |
0 m | Fac. mobile | 4 | 5 | 2 | 6 | ... |
0 m | Stationary, attached | 1 | 3 | 4 | 2 | ... |
0.5 m | Slow moving | 1 | 3 | 1 | 3 | ... |
0.5 m | Fac. mobile | 4 | 4 | 3 | 4 | ... |
0.5 m | Stationary, attached | 1 | 3 | 0 | 3 | ... |
For example:
df %>%
group_by(Sample, Motility) %>%
summarise(trial = sum(across(t1:t100)))
This gave me 1 column with sums of each trait (nice) but summing all 100 trials together (not nice). I understand that a for loop might be able to help me out however my understanding of them and their functionality is extremely poor.
Sample | Motility | trial |
---|---|---|
0 m | Slow moving | 60 |
0 m | Fac. mobile | 35 |
0.5 m | Slow moving | 42 |
0.5 m | Fac. mobile | 26 |
0.5 m | Stationary, attached | 48 |
Any more information that you may need I can absolutely provide.
Thank you
The across
helper function can be used in this way:
df |>
group_by(Sample, Motility) |>
summarise(across(t1:t3, sum))
# A tibble: 6 × 5 # Groups: Sample [2] Sample Motility t1 t2 t3 <chr> <chr> <int> <int> <int> 1 0 m Fac. mobile 5 2 6 2 0 m Slow moving 3 1 5 3 0 m Stationary, attached 3 4 2 4 0.5 m Fac. mobile 4 3 4 5 0.5 m Slow moving 3 1 4 6 0.5 m Stationary, attached 3 0 3
Another common way to tackle a problem like this, is to first use pivot_longer
to change the data into "long" format, then summarise after:
df |>
pivot_longer(t1:t3, names_to = 't', values_to = 'value') |>
group_by(Sample, Motility, t) |>
summarise(sum = sum(value))
# A tibble: 18 × 4
# Groups: Sample, Motility [6]
Sample Motility t sum
<chr> <chr> <chr> <int>
1 0 m Fac. mobile t1 5
2 0 m Fac. mobile t2 2
3 0 m Fac. mobile t3 6
4 0 m Slow moving t1 3
5 0 m Slow moving t2 1
6 0 m Slow moving t3 5
7 0 m Stationary, attached t1 3
8 0 m Stationary, attached t2 4
9 0 m Stationary, attached t3 2
10 0.5 m Fac. mobile t1 4
11 0.5 m Fac. mobile t2 3
12 0.5 m Fac. mobile t3 4
13 0.5 m Slow moving t1 3
14 0.5 m Slow moving t2 1
15 0.5 m Slow moving t3 4
16 0.5 m Stationary, attached t1 3
17 0.5 m Stationary, attached t2 0
18 0.5 m Stationary, attached t3 3