Search code examples
rcombinationscombn

Using combn to make specific functions for grouped pair-wise, row-wise comparisons


This is a small section of a dataset I'm working on.

dat2 <- read.table(text = "
   nodepair  V1  V2  V3  V4  V5  V6  V7  V8  V9 ES   
 1 A1_A1        0    21     0     0     0     0     0     0    78 45   
 2 A2_A1        0     0     0     0     0     0     0     0    99 45   
 3 A2_A2        0     1     0     0     0     0     0     0    98 45   
 4 A3_A1        0     0     0     0     0     6     1     3    89 45   
 5 A3_A2        0     0     0     0     0     0     0     0    99 45   
 6 A1_A1        0    20     0     0     0     0     0     0    65 46   
 7 A2_A1        0     0     0     0     0     0     0     0    85 46   
 8 A2_A2        0     1     0     0     0     0     0     0    84 46   
 9 A3_A1        0     0     0     0     2     6     3     3    71 46   
 10 A3_A2        0     0     0     0     0     0     0     0    85 46   
 11 A1_A1        0    25     0     0     0     0     0     0    45 47   
 12 A2_A1        0     0     0     0     0     0     0     0    70 47   
 13 A2_A2        0     1     0     0     0     0     0     0    69 47   
 14 A3_A1        0     0     0     0     0     8     0     1    61 47   
 15 A3_A2        0     0     0     0     0     0     0     0    70 47   
 16 A1_A1        0    37     0     0     0     0     0     0    77 48   
 17 A2_A1        0     0     0     0     0     0     0     0   114 48   
 18 A2_A2        0     0     0     0     0     0     0     0   114 48   
 19 A3_A1        0     0     0     0     2     9     0     3   100 48   
 20 A3_A2        0     0     0     0     0     0     0     0   114 48   
 ", header = TRUE)

I'm trying to write a program that will do all pairwise comparisons (grouped by the nodepair) across the 'ES' groups.

I'd like to write a series of functions to specifically compare each pair of rows. For example, when V1:V9 is > 0 for both ESs, this should result in 1, indicating presence of data.

I'm imagining the output to look something like this:

 dat3 <- read.table(text = "
    nodepair1 nodepair2  V1  V2  V3  V4  V5  V6  V7  V8  V9    
    A1_A1(45) A1_A1(46)   0     0    1     0     0     0     0     0     1        
  ", header = TRUE)

etc.

Unfortunately, I haven't gotten very far:

 dat2 <- dat2 %>%
   group_by(nodepair) %>%
   col2 = t(combn(nodepair,2)))

I'm pretty sure I need 'combn' here, but I'm very new to this function and can't figure it out.


Solution

  • Now with the TO having clarified their question, I'd propose the following solution:

    library(tidyverse)
    
    ES_combs <- combn(unique(dat2$ES), 2, simplify = FALSE)
    
    dat2 |> 
      group_split(nodepair) |> 
      map(.x = _,
          .f = \(df) df |> 
            map(.x = 1:length(ES_combs),
                .f = ~df |> 
                   filter(ES %in% ES_combs[[.x]]) |> 
                   summarize(nodepair = first(nodepair),
                             ES_1 = ES[1],
                             ES_2 = ES[2],
                             across(V1:V9, ~as.numeric(all(. >0)))))) |> 
      bind_rows()
    

    which gives:

    # A tibble: 30 × 12
       nodepair  ES_1  ES_2    V1    V2    V3    V4    V5    V6    V7    V8    V9
       <chr>    <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
     1 A1_A1       45    46     0     1     0     0     0     0     0     0     1
     2 A1_A1       45    47     0     1     0     0     0     0     0     0     1
     3 A1_A1       45    48     0     1     0     0     0     0     0     0     1
     4 A1_A1       46    47     0     1     0     0     0     0     0     0     1
     5 A1_A1       46    48     0     1     0     0     0     0     0     0     1
     6 A1_A1       47    48     0     1     0     0     0     0     0     0     1
     7 A2_A1       45    46     0     0     0     0     0     0     0     0     1
     8 A2_A1       45    47     0     0     0     0     0     0     0     0     1
     9 A2_A1       45    48     0     0     0     0     0     0     0     0     1
    10 A2_A1       46    47     0     0     0     0     0     0     0     0     1
    11 A2_A1       46    48     0     0     0     0     0     0     0     0     1
    12 A2_A1       47    48     0     0     0     0     0     0     0     0     1
    13 A2_A2       45    46     0     1     0     0     0     0     0     0     1
    14 A2_A2       45    47     0     1     0     0     0     0     0     0     1
    15 A2_A2       45    48     0     0     0     0     0     0     0     0     1
    16 A2_A2       46    47     0     1     0     0     0     0     0     0     1
    17 A2_A2       46    48     0     0     0     0     0     0     0     0     1
    18 A2_A2       47    48     0     0     0     0     0     0     0     0     1
    19 A3_A1       45    46     0     0     0     0     0     1     1     1     1
    20 A3_A1       45    47     0     0     0     0     0     1     0     1     1
    21 A3_A1       45    48     0     0     0     0     0     1     0     1     1
    22 A3_A1       46    47     0     0     0     0     0     1     0     1     1
    23 A3_A1       46    48     0     0     0     0     1     1     0     1     1
    24 A3_A1       47    48     0     0     0     0     0     1     0     1     1
    25 A3_A2       45    46     0     0     0     0     0     0     0     0     1
    26 A3_A2       45    47     0     0     0     0     0     0     0     0     1
    27 A3_A2       45    48     0     0     0     0     0     0     0     0     1
    28 A3_A2       46    47     0     0     0     0     0     0     0     0     1
    29 A3_A2       46    48     0     0     0     0     0     0     0     0     1
    30 A3_A2       47    48     0     0     0     0     0     0     0     0     1
    

    This probably needs a bit of explanation:

    • We start with creating all pairwise combinations of ES in your data frame and assign it to a list object ES_combs
    • We then take your data and split it by nodepair group into a list, where each list object is the data for one nodepair group.
    • We then initiate the outer map where we go through each group's data frame. It is important here to define an anonymous function, because we have an inner map, so we can't use the .x parameter twice.
    • The inner map takes each combination pair from ES_combs and filters the current group's data to these two rows. We then apply the summarize part.
    • As a last step, we use bind_rows to merge everything into a nice tibble instead of having an annoyingly long list.