I have data that looks like this:
Tab4 <- read.table(text = "
nodepair `++` `--` `+-` `-+` `0+` `+0` `0-` `-0` `00` ES
1 A1_A1 0 4 0 0 0 0 0 0 16 3
2 A1_A1 0 5 0 0 0 0 0 0 16 4
3 A1_A1 0 5 0 0 0 0 0 0 15 5
", header = TRUE)
and I've written this code so that each group 'ES' is pairwise compared by nodepair:
ES_combs <- combn(unique(Tab4$ES), 2, simplify = FALSE)
Tab5 <- Tab4 %>% ########### compare every pair to eachother
group_split(nodepair) %>%
map(.f = function(df) df %>%
map(.x = 1:length(ES_combs),
.f = ~df %>%
filter(ES %in% ES_combs[[.x]]) %>%
summarize(nodepair = first(nodepair),
ES_1 = ES[1],
ES_2 = ES[2],
across(2:10, ~as.numeric(.))))) %>%
bind_rows()
resulting in this:
Tab5 <- read.table(text = "
nodepair ES_1 ES_2 `++` `--` `+-` `-+` `0+` `+0` `0-` `-0` `00`
1 A1_A1 3 4 0 4 0 0 0 0 0 0 16
2 A1_A1 3 4 0 5 0 0 0 0 0 0 16
3 A1_A1 3 5 0 4 0 0 0 0 0 0 16
4 A1_A1 3 5 0 5 0 0 0 0 0 0 15
5 A1_A1 4 5 0 5 0 0 0 0 0 0 16
6 A1_A1 4 5 0 5 0 0 0 0 0 0 15
", header = TRUE)
This works but takes much too long when I'm comparing my full dataset. I was hoping there is a more effective code? I suspect that this warning I get is exposing part of the problem:
Warning messages:
1: Returning more (or less) than 1 row per `summarise()` group was deprecated in dplyr 1.1.0.
ℹ Please use `reframe()` instead.
ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()` always returns an ungrouped
data frame and adjust accordingly.
but I'm not sure where to go from here.
We can do an inner join and remove duplicates:
out <- merge(Tab4,Tab4[,c('nodepair','ES')],by='nodepair',suffixes=c("1","2"),all=T)
out[out$ES1!=out$ES2,]
nodepair X.... X.....1 X.....2 X.....3 X.0.. X..0. X.0...1 X..0..1 X.00. ES1 ES2
2 A1_A1 0 4 0 0 0 0 0 0 16 3 4
3 A1_A1 0 4 0 0 0 0 0 0 16 3 5
4 A1_A1 0 5 0 0 0 0 0 0 16 4 3
6 A1_A1 0 5 0 0 0 0 0 0 16 4 5
7 A1_A1 0 5 0 0 0 0 0 0 15 5 3
8 A1_A1 0 5 0 0 0 0 0 0 15 5 4