I have a dataset as such:
data.frame(ID = c("A1","A6","A3","A55","BC","J5","Ca", "KQF", "FK", "AAAA","ABBd","XXF"), Group = paste0("Group",c(1,1,1,1,1,2,2,2,2,2,1,2)))
ID Group
1 A1 Group1
2 A6 Group1
3 A3 Group1
4 A55 Group1
5 BC Group1
6 J5 Group2
7 Ca Group2
8 KQF Group2
9 FK Group2
10 AAAA Group2
11 ABBd Group1
12 XXF Group2
How can I create two sub-dataframes from the above data such that there are no repeats and there are exactly the same number of elements from Group1
and Group2
in each sub-dataframe? Both sub-dataframes combined together are always identical to the original dataframe.
ID is always unique.
EXAMPLE RESULT
subDF1
ID Group
1 A1 Group1
4 A55 Group1
11 ABBd Group1
6 J5 Group2
8 KQF Group2
9 FK Group2
subDF2
ID Group
2 A6 Group1
3 A3 Group1
5 BC Group1
7 Ca Group2
10 AAAA Group2
12 XXF Group2
OK. I believe this is the correct way to do it. This will work well even if there are an odd number of elements in one group (or even both).
x <- data.frame(ID = c("A1","A6","A3","A55","BC","J5","Ca", "KQF", "FK", "AAAA","ABBd","XXF"),
Group = paste0("Group",c(1,1,1,1,1,2,2,2,2,2,1,2)))
x$SubDF <- NA
x[which(x$Group == "Group1"),]$SubDF <- sample(rep(c("SubDF1", "SubDF2"), each = table(x$Group)["Group1"]/2),
size = length(which(x$Group == "Group1")), replace = ifelse(test = table(x$Group)["Group1"] %% 2 != 0, yes = TRUE, FALSE))
x[which(x$Group == "Group2"),]$SubDF <- sample(rep(c("SubDF1", "SubDF2"), each = table(x$Group)["Group2"]/2),
size = length(which(x$Group == "Group2")), replace = ifelse(test = table(x$Group)["Group2"] %% 2 != 0, yes = TRUE, FALSE))
subDF1 <- x %>% dplyr::filter(SubDF == "SubDF1") %>% dplyr::select(-SubDF)
subDF2 <- x %>% dplyr::filter(SubDF == "SubDF2") %>% dplyr::select(-SubDF)
> subDF1 ID Group 1 A3 Group1 2 BC Group1 3 J5 Group2 4 FK Group2 5 AAAA Group2 6 ABBd Group1 > subDF2 ID Group 1 A1 Group1 2 A6 Group1 3 A55 Group1 4 Ca Group2 5 KQF Group2 6 XXF Group2