Search code examples
rperlpermutationpermute

Permute labels of matrix while preserving the pairing of the samples


I have a matrix that's 490 rows (features; F1..F490) and 350 columns (350 samples; s1..s350). The first columns look like this:

Drug    T   T   T   C   T
Sample  s1  s2  s3  s4  s5 .....
Pair    16  81 -16  32 -81 .....
Cond    B   D    B   B  D  .....
F1      34  23   12     9  .....
F2      78       11  87 10 .....
...

(there are missing data, it's normal).

There are 2 conditions; B and D. There are 2 drugs (T and C). The samples are paired. So for example, s1 and s3 are paired because their Pair value is the same (in absolute value).

What I'm trying to do, is to permute the drugs labels 1000 times while preserving the information on the pairing (Pair value). So, a pair should always have the same condition (B in this case) and the same Pair value (16 and -16 in this case). Also, they have to have the same drug label. Example; s1 and s3 are a pair; the have the same Pair value, are both B and have both the drug label T.

So 1 of the 1000 permuted files should look something like this for example:

Drug    C   T   C   T   T
Sample  s1  s2  s3  s4  s5 .....
Pair    16  81 -16  32 -81 .....
Cond    B   D    B   B  D  .....
F1      34  23   12     9  .....
F2      78       11  87 10 .....
...

I don't mind if the samples are not in order.

I've tried permute and sample (in R), but I can't seem to find a way to do it while including the conditions described above.. I'm sorry if this is obvious..

I want to use these permutated files (n=1000) for a downstream analysis that I already coded.

Thank you very much for your input.


Solution

  • Given the data df. Group by absolute value of Pair and then sample/ permute Drug for the grouped pairs. Finally join back on absolute value of Pairs. Using dplyr:

    t_df <- as.data.frame(t(df))                    # transposed to use features as cols
    t_df$Pair <- as.numeric(as.character(t_df$Pair)
    
    library(dplyr)
    
    # Wrap this into a function to call/ permute 1000 times
    df_out <- t_df %>% mutate(abs_pair = abs(Pair)) %>% 
                  group_by(abs_pair) %>% filter(row_number()==1) %>% 
              ungroup() %>% mutate(Permuted_drug = sample(Drug, n())) %>%      
                  select(abs_pair, Permuted_drug) %>%
              inner_join(t_df %>% mutate(abs_pair = abs(Pair)))
    
    df_out
    #  abs_pair Permuted_drug Drug  Sample  Pair Cond 
    #     <dbl> <fct>         <fct> <fct>  <dbl> <fct>
    #1       16 T             T     s1        16 B    
    #2       16 T             T     s3       -16 B    
    #3       81 C             T     s2        81 D    
    #4       81 C             T     s5       -81 D    
    #5       32 T             C     s4        32 B    
    

    Data Used:

    df <- read.table(text = "Drug    T   T   T   C   T
    Sample  s1  s2  s3  s4  s5
    Pair    16  81 -16  32 -81
    Cond    B   D    B   B  D", row.names = 1)