Merge rows of data frame that satisfy condition (belong to a group) by column (id)

I've got a 300x5 data frame that looks like this (3 groups, 100 rows per group, ids go from 01 to 100 per group, same id means same feat2 and feat3 but always different num_to_sum):

factor/group	id	num_to_sum	feat2	feat3
group1	01	4	...	...
group1	02	9	...	...
group2	01	3	...	...
group2	02	1	...	...
group3	01	4	...	...
group3	02	8	...	...

Now, I'd like to merge all rows by id but only if they belong to a certain group. For example: I want to merge group1 and group2 by id and apply the sum operator to num_to_sum, so the final data frame (output) shall look like this:

factor/group	id	summed_num	feat2	feat3
group12	01	(4+3=)7	...	...
group12	02	(9+1=)10	...	...
group3	01	4	...	...
group3	02	8	...	...

I don't mind about the other features (feat2 and feat3); they are actually the same for each id regardless of the group, so I could drop them, I just don't want them duplicated in the final df as that would mean duplicated data and that's messy.

It'd be great if there is a solution that can be applied to not just two groups, but to n groups that shall be merged.

I've looked at stats::aggregate, but I can't figure out how to aggregate with a condition.

Solution

You should create a new factor that combines the groups you want to sum, then aggregate with that and the id.

With base R:

tab <- read.table(text = "
factor/group    id  num_to_sum
group1  01  4
group1  02  9
group2  01  3
group2  02  1
group3  01  4
group3  02  8
", header = TRUE)

tab
#>   factor.group id num_to_sum
#> 1       group1  1          4
#> 2       group1  2          9
#> 3       group2  1          3
#> 4       group2  2          1
#> 5       group3  1          4
#> 6       group3  2          8

grp <- factor(tab$factor.group)
levels(grp) <- list(group12 = c("group1", "group2"), group3 = "group3")
grp
#> [1] group12 group12 group12 group12 group3  group3 
#> Levels: group12 group3

aggregate(num_to_sum ~ id + grp, tab, sum)
#>   id     grp num_to_sum
#> 1  1 group12          7
#> 2  2 group12         10
#> 3  1  group3          4
#> 4  2  group3          8

With forcats and dplyr:

library(dplyr, warn.conflicts = FALSE)
library(forcats)

tab |>
  mutate(grp = fct_collapse(factor.group, group12 = c("group1", "group2"))) |>
  group_by(grp, id) |>
  summarise(num_to_sum = sum(num_to_sum))
#> `summarise()` has grouped output by 'grp'. You can override using the `.groups`
#> argument.
#> # A tibble: 4 × 3
#> # Groups:   grp [2]
#>   grp        id num_to_sum
#>   <fct>   <int>      <int>
#> 1 group12     1          7
#> 2 group12     2         10
#> 3 group3      1          4
#> 4 group3      2          8