I'm trying to find a solution in base
R
that can split a data.frame
in groups, based on values in a column (group
), that are also dependent by the occurrence of values in another column (id
).
For example:
I have a data.frame
df = data.frame(id = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,17,18,18,19,19,20,20),
group = c("A","A","A","A","A","A","A","A","B","B","B","B","C","C","C","C","A","B","A","B","A","B","A","B"),
num = c(0.1,0.1,0.1,0.1,0.2,0.2,0.2,0.2,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1),
value = c(sample(10:90, 16, replace = TRUE), rep(c(21,67,80,69), times=1, each=2)))
> df
id group num value
1 1 A 0.1 59
2 2 A 0.1 72
3 3 A 0.1 82
4 4 A 0.1 17
5 5 A 0.2 39
6 6 A 0.2 46
7 7 A 0.2 39
8 8 A 0.2 56
9 9 B 0.1 31
10 10 B 0.1 46
11 11 B 0.1 63
12 12 B 0.1 15
13 13 C 0.1 51
14 14 C 0.1 68
15 15 C 0.1 48
16 16 C 0.1 28
17 17 A 0.1 21
18 17 B 0.1 21
19 18 A 0.1 67
20 18 B 0.1 67
21 19 A 0.1 80
22 19 B 0.1 80
23 20 A 0.1 69
24 20 B 0.1 69
and I'm trying to split
the data.frame
based on the following groups:
> df
$`A`
id group num value
1 1 A 0.1 59
2 2 A 0.1 72
3 3 A 0.1 82
4 4 A 0.1 17
5 5 A 0.2 39
6 6 A 0.2 46
7 7 A 0.2 39
8 8 A 0.2 56
$`B`
id group num value
9 9 B 0.1 31
10 10 B 0.1 46
11 11 B 0.1 63
12 12 B 0.1 15
$`C`
id group num value
13 13 C 0.1 51
14 14 C 0.1 68
15 15 C 0.1 48
16 16 C 0.1 28
$`AB`
id group num value
17 17 A 0.1 21
18 17 B 0.1 21
19 18 A 0.1 67
20 18 B 0.1 67
21 19 A 0.1 80
22 19 B 0.1 80
23 20 A 0.1 69
24 20 B 0.1 69
The last group is identified by pairs based on the column id
. Meaning, if the id
comes in pairs (17,18,19,20), the group is considered a separate group (AB) compared to the group (A and B) where the id dose not come in pairs (1:12).
How can this be accomplished with base
R
? Can this be done using the function split()
?
You can first use ave
to generate the groups you wanted, then split
base on it.
To have the last group named as A + B
, you'll need to collapse the ave
results with paste
with +
(or obviously, sub
the previous toString
results). To have it being split at the end, the most convenient way I can think of is to use fct_inorder
in the forcats
package, otherwise, use the excellent codes shared by @Friede if you want to stay in base R.
split(df, forcats::fct_inorder(ave(df$group, df$id, FUN = \(x) paste(x, collapse = " + "))))
$A
id group num value
1 1 A 0.1 78
2 2 A 0.1 66
3 3 A 0.1 18
4 4 A 0.1 81
5 5 A 0.2 35
6 6 A 0.2 16
7 7 A 0.2 51
8 8 A 0.2 18
$B
id group num value
9 9 B 0.1 45
10 10 B 0.1 87
11 11 B 0.1 90
12 12 B 0.1 52
$C
id group num value
13 13 C 0.1 85
14 14 C 0.1 24
15 15 C 0.1 41
16 16 C 0.1 16
$`A + B`
id group num value
17 17 A 0.1 21
18 17 B 0.1 21
19 18 A 0.1 67
20 18 B 0.1 67
21 19 A 0.1 80
22 19 B 0.1 80
23 20 A 0.1 69
24 20 B 0.1 69