Summing a set of R dataframe rows (column-wise), while retaining the first n columns

I have a large R dataframe of which the first 21 columns are abiotic variables (incl. sample names), and column 22-72 are species with relative abundances as values. Due to processing of the data, each sample (i.e., col 1) has multiple rows for all species (with variable rel. ab. values). The abiotic variables of each of that row (of a sample) are identical.

I would like to sum the relative abundance values of each species per sample.

Below, you can find a example original dataframe and the desired outcome.

Original:

df <- data.frame(
  sample = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3),
  var1 = c(3,3,3,3,3,3,3,7,7,7,7,7,7,7,2,2,2,2,2,2),
  var2 = c(4,4,4,4,4,4,4,42,42,42,42,42,42,42,2,2,2,2,2,2),
  species1 = c(0,0,0.05,0,0,0.02,0,0,0,0,0,0,0,0,0,0.001,0.02,0.03,0.001,0),
  species2 = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.001,0.002,0.03,0,0,0)
)

desired outcome:

df_summed <- data.frame(
  sample = c(1, 2, 3),
  var1 = c(3, 7, 2),
  var2 = c(4, 42, 2),
  species1 = c(0.07, 0, 0.052),
  species2 = c(0, 0, 0.033)
)

I've tried multiple things with dplyr functions like group_by and summarise. For example:

df_summed <- df %>%
  group_by(across(1:21)) %>%
  summarise(across(22:ncol(df), sum), .groups = "drop")

but this gives me the error

Caused by error in `across()`:
! Can't subset columns past the end.
ℹ Locations 52, 53, 54, …, 71, and 72 don't exist.
ℹ There are only 51 columns.

while the df does have 72 columns... ncol(df) yields 72

Could anyone assist me how to perform this operation?

Solution

After your group_by, you need to adjust the summarise columns accordingly:

For your toy data, the number of columns to group on is 3 (1 + 2 vars). The columns to summarise on (excluding these grouping columns) is now 1:2, not 4:5.

df_summed <- df %>%
  group_by(across(1:3)) %>%
  summarise(across(1:2, sum), .groups = "drop")
# A tibble: 3 × 5
  sample  var1  var2 species1 species2
   <dbl> <dbl> <dbl>    <dbl>    <dbl>
1      1     3     4    0.07     0    
2      2     7    42    0        0    
3      3     2     2    0.052    0.033

Although Sam's answer advises against using numbers to select columns, this answer explains why yours failed.