I have a dataset with duplicates across groups. For instance:
dat <- data.frame(
group = c("A", "A", "A", "B", "B", "C","C","C"),
values = c("duplicate1","duplicate2",3,"duplicate1",
5,"duplicate1","duplicate2",6)
)
My expected output is a list of N datasets of unique combinations of how the duplicates can be kept by each group:
dfs <- list(df1, df2, df3, df4, df5, df6)
dfs[[1]] ## Combination 1
group values
1 A duplicate1
2 A duplicate2
3 A 3
4 B 5
5 C 6
dfs[[2]] ## Combination 2
group values
1 A duplicate2
2 A 3
3 B 5
4 B duplicate1
5 C 6
dfs[[3]] ## Combination 3
group values
1 A duplicate2
2 A 3
3 B 5
4 C 6
5 C duplicate1
dfs[[4]] ## Combination 4
group values
1 A duplicate1
2 A 3
3 B 5
4 C 6
5 C duplicate2
dfs[[5]] ## Combination 5
group values
1 A 3
2 B 5
3 B duplicate1
4 C 6
5 C duplicate2
dfs[[6]] ## Combination 6
group values
1 A 3
2 B 5
3 C 6
4 C duplicate1
5 C duplicate2
I thought I had a solution: Find all unique combinations of removing a duplicate in groups from a data set
However, this solution does not work if the duplicate is across > 2 groups, as in the above example. It will only remove one of the duplicates from the dataframe, and combinations will then for instance have kept duplicate1 in group B or C as well.
library(dplyr)
dat %>%
summarise(group = list(group), .by = values) %>%
{apply(expand.grid(.$group), 1, \(x)
data.frame(group = x, values = .$values, row.names = NULL) %>%
arrange(group))}
#> [[1]]
#> group values
#> 1 A duplicate1
#> 2 A duplicate2
#> 3 A 3
#> 4 B 5
#> 5 C 6
#>
#> [[2]]
#> group values
#> 1 A duplicate2
#> 2 A 3
#> 3 B duplicate1
#> 4 B 5
#> 5 C 6
#>
#> [[3]]
#> group values
#> 1 A duplicate2
#> 2 A 3
#> 3 B 5
#> 4 C duplicate1
#> 5 C 6
#>
#> [[4]]
#> group values
#> 1 A duplicate1
#> 2 A 3
#> 3 B 5
#> 4 C duplicate2
#> 5 C 6
#>
#> [[5]]
#> group values
#> 1 A 3
#> 2 B duplicate1
#> 3 B 5
#> 4 C duplicate2
#> 5 C 6
#>
#> [[6]]
#> group values
#> 1 A 3
#> 2 B 5
#> 3 C duplicate1
#> 4 C duplicate2
#> 5 C 6
Created on 2024-04-22 with reprex v2.0.2