I have a dataset with combinations and their frequency as shown below in an example. The idea is to find all combinations (every name has to be used) to have the highest possible value for count (frequency).
Person 1 | Person 2 | Count |
---|---|---|
A | B | 4 |
A | D | 4 |
A | C | 3 |
B | C | 2 |
C | D | 1 |
B | D | 0 |
A, B, C and D are names of people and count is the frequency of a combination of two people. In this example the highest count can be reached by having an AD and BC combination, which sums to 6 (4+2). If we take AB and CD as a combination the total sum of count will be lower (5, 4+1).
I would like to have a dataset looking like this as an answer:
Person 1 | Person 2 | Count |
---|---|---|
A | D | 4 |
B | C | 2 |
How can I create this dataset from the original without having duplicate names and with having the highest possible count. So if there is an AD combination, there can not be another combination including A or D.
I tried following code, but this does not give me the desired dataset:
dat <- data %>%
arrange(desc(count))
count = 0
while (nrow(dat)>0){
print(dat[1,])
dat <- dat %>%
filter(!(X1==X1[1]|X1==X2[1]|X2==X1[1]|X2==X2[1]))
}
dat is the arranged dataset shown in the first table. I print the first row with the highest count and delete all combinations that has one of the names in their combination (because I can use a name only once). This is looped until there are no more people left.
This code will give following dataset:
Person 1 | Person 2 | Count |
---|---|---|
A | B | 4 |
C | D | 1 |
Thank you in advance.
There is probably a more elegant solution with igraph
, but here is my approach:
Using your data
your_data <- tibble::tribble( ~Person.1, ~Person.2, ~Count, "A", "B", 4L, "A", "D", 4L, "A", "C", 3L, "B", "C", 2L, "C", "D", 1L, "B", "D", 0L)
and assuming Person.1
and Person.2
are in alphabetical order, you can do
library(purrr)
with(your_data, unique(c(Person.1, Person.2))) %>%
combinat::permn(\(x) split(x, (seq_along(x) + 1) %/% 2) %>%
map(sort) %>%
map_dfr(set_names, c("Person.1", "Person.2"))) %>%
map(~ arrange(.x, Person.1)) %>%
unique() %>%
imap(~ dplyr::left_join(.x, your_data)) %>%
rlist::list.sort((sum(Count))) %>%
first()
returning the desired
# A tibble: 2 x 3
Person.1 Person.2 Count
<chr> <chr> <int>
1 A D 4
2 B C 2