Split data frame by group based on the occurance of an id

I'm trying to find a solution in base R that can split a data.frame in groups, based on values in a column (group), that are also dependent by the occurrence of values in another column (id).

For example:

I have a data.frame

df = data.frame(id = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,17,18,18,19,19,20,20), 
                group = c("A","A","A","A","A","A","A","A","B","B","B","B","C","C","C","C","A","B","A","B","A","B","A","B"),
                num = c(0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.2,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1),
                value = c(sample(10:90, 16, replace = TRUE), rep(c(21,67,80,69), times=1, each=2)))

> df
   id group num value
1   1     A 0.1    59
2   2     A 0.1    72
3   3     A 0.1    82
4   4     A 0.1    17
5   5     A 0.2    39
6   6     A 0.2    46
7   7     A 0.2    39
8   8     A 0.2    56
9   9     B 0.1    31
10 10     B 0.1    46
11 11     B 0.1    63
12 12     B 0.1    15
13 13     C 0.1    51
14 14     C 0.1    68
15 15     C 0.1    48
16 16     C 0.1    28
17 17     A 0.1    21
18 17     B 0.1    21
19 18     A 0.1    67
20 18     B 0.1    67
21 19     A 0.1    80
22 19     B 0.1    80
23 20     A 0.1    69
24 20     B 0.1    69

and I'm trying to split the data.frame based on the following groups:

> df
$`A`
   id group num value
1   1     A 0.1    59
2   2     A 0.1    72
3   3     A 0.1    82
4   4     A 0.1    17
5   5     A 0.2    39
6   6     A 0.2    46
7   7     A 0.2    39
8   8     A 0.2    56

$`B`
   id group num value
9   9     B 0.1    31
10 10     B 0.1    46
11 11     B 0.1    63
12 12     B 0.1    15

$`C`
   id group num value
13 13     C 0.1    51
14 14     C 0.1    68
15 15     C 0.1    48
16 16     C 0.1    28

$`AB`
   id group num value
17 17     A 0.1    21
18 17     B 0.1    21
19 18     A 0.1    67
20 18     B 0.1    67
21 19     A 0.1    80
22 19     B 0.1    80
23 20     A 0.1    69
24 20     B 0.1    69

The last group is identified by pairs based on the column id. Meaning, if the id comes in pairs (17,18,19,20), the group is considered a separate group (AB) compared to the group (A and B) where the id dose not come in pairs (1:12).

How can this be accomplished with base R? Can this be done using the function split()?

Solution

You can first use ave to generate the groups you wanted, then split base on it.

To have the last group named as A + B, you'll need to collapse the ave results with paste with + (or obviously, sub the previous toString results). To have it being split at the end, the most convenient way I can think of is to use fct_inorder in the forcats package, otherwise, use the excellent codes shared by @Friede if you want to stay in base R.

split(df, forcats::fct_inorder(ave(df$group, df$id, FUN = \(x) paste(x, collapse = " + "))))

$A
  id group num value
1  1     A 0.1    78
2  2     A 0.1    66
3  3     A 0.1    18
4  4     A 0.1    81
5  5     A 0.2    35
6  6     A 0.2    16
7  7     A 0.2    51
8  8     A 0.2    18

$B
   id group num value
9   9     B 0.1    45
10 10     B 0.1    87
11 11     B 0.1    90
12 12     B 0.1    52

$C
   id group num value
13 13     C 0.1    85
14 14     C 0.1    24
15 15     C 0.1    41
16 16     C 0.1    16

$`A + B`
   id group num value
17 17     A 0.1    21
18 17     B 0.1    21
19 18     A 0.1    67
20 18     B 0.1    67
21 19     A 0.1    80
22 19     B 0.1    80
23 20     A 0.1    69
24 20     B 0.1    69