I've a dataframe as under:
+------+-----+----------+
| from | to | priority |
+------+-----+----------+
| 1 | 8 | 1 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 4 | 5 | 3 |
| 5 | 6 | 4 |
| 6 | 2 | 5 |
| 7 | 8 | 2 |
| 4 | 3 | 5 |
| 2 | 1 | 1 |
| 6 | 6 | 4 |
| 1 | 7 | 5 |
| 8 | 4 | 6 |
| 9 | 5 | 3 |
+------+-----+----------+
my goal is to group the "to" column based on the from column but in such a way that if a variable is already present in either of the columns, I wouldn't want to take them into consideration further Also, the total priority would be a sum of all the group priorities
So the resulting dataframe would be as under:
+------+------+----------------+
| from | to | Total Priority |
+------+------+----------------+
| 1 | 8, 7 | 6 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 9 | 5 | 3 |
+------+------+----------------+
Also, I'd want to keep the same order as of the original table while grouping
I was able to collapse the from column using the "splitstackshape" package as under
library(splitstackshape)
cSplit(df, 'to', sep = ','
+ , direction = 'long')[, .(to = toString(unique(to)))
+ , by = from]
This does introduce dupicate values I was wondering if there's a way to get the desired result using any other packages
Using DF
shown reproducibly in the Note at the end, sort by from
giving DF2
and then iterate through its rows removing any row with a duplicate. We need a loop here since each removal depends on the prior ones. Finally summarize the result.
library(dplyr)
DF2 <- arrange(DF, from)
i <- 1
while(i <= nrow(DF2)) {
ix <- seq_len(i-1)
dup <- with(DF2, (to[i] %in% c(to[ix], from[ix])) | (from[i] %in% to[ix]))
if (dup) DF2 <- DF2[-i, ] else i <- i + 1
}
DF2 %>%
group_by(from) %>%
summarize(to = toString(to), priority = sum(priority)) %>%
ungroup
giving:
# A tibble: 4 x 3
from to priority
<int> <chr> <int>
1 1 8, 7 6
2 2 6 1
3 3 4 1
4 9 5 3
Lines <- "from | to | priority
1 | 8 | 1
2 | 6 | 1
3 | 4 | 1
4 | 5 | 3
5 | 6 | 4
6 | 2 | 5
7 | 8 | 2
4 | 3 | 5
2 | 1 | 1
6 | 6 | 4
1 | 7 | 5
8 | 4 | 6
9 | 5 | 3"
DF <- read.table(text = Lines, header = TRUE, sep = "|", strip.white = TRUE)