Search code examples
rcombn

R: combn function and define names of generated variables


I have a data frame named “dat” with 5 numeric variables (var1, var2,var3,var4 , var5), each with 20 observations.

structure(list(var_1 = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20), var_2 = c(7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26), var_3 = c(4, 
5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 
22, 23), var_4 = c(2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 
15, 16, 17, 18, 19, 20, 21)), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))

I am using this code to create and save into a new data frame (named “combined”) all possible pairs of combination of the 5 variables with the mean value of the 2 variables which are combined together:

combined <- combn(dat, 2, FUN = rowMeans)

This is the result:

structure(c(4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 
18, 19, 20, 21, 22, 23, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 
10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5, 
21.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 
12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5, 5.5, 6.5, 
7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 
18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 4.5, 5.5, 6.5, 7.5, 
8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 
19.5, 20.5, 21.5, 22.5, 23.5, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
13, 14, 15, 16, 17, 18, 19, 20, 21, 22), .Dim = c(20L, 6L))

1) The code works fine, but the problem is that the new generated variables in the data frame “combined”, are named V1,V2,V3,V4…. and I cannot understand each new variable which combination of variables comes from. I would prefer that the new generated variables will be named as “var1var2”, “var1var3” and so on… Is there a way to obtain this?

2) Also, is there a way to apply the combn function only to some columns and no to all variables which are present in the dataframe “dat”?

3) How can I add the new generated variables in the original data frame “dat” rather saving them in a new one?

Thank you so much for your help!


Solution

  • This can be done by redoing the combn statement with the column names

    set.seed(99)
    dat <- data.frame(var1 = sample(20),           #some sample data
                      var2 = sample(20),           #I did this before you added your data above!
                      var3 = sample(20),
                      var4 = sample(20),
                      var5 = sample(20))
    
    dat
       var1 var2 var3 var4 var5
    1    12    5   18   19   12
    2     3    2   10    6   13
    3    13   15   14   13    1
    4    17   11   16   18   10
    5     9   13    8    8    7
    6    15    6   20   17    3  
    ...
    
    combined <- combn(dat, 2, FUN = rowMeans)      #your statement using cols of dat
    
    colnames(combined) <- combn(names(dat), 2, paste0, collapse="") #same using colnames
    
    combined
    
          var1var2 var1var3 var1var4 var1var5 var2var3 var2var4 var2var5 var3var4 var3var5 var4var5
     [1,]      8.5     15.0     15.5     12.0     11.5     12.0      8.5     18.5     15.0     15.5
     [2,]      2.5      6.5      4.5      8.0      6.0      4.0      7.5      8.0     11.5      9.5
     [3,]     14.0     13.5     13.0      7.0     14.5     14.0      8.0     13.5      7.5      7.0
     [4,]     14.0     16.5     17.5     13.5     13.5     14.5     10.5     17.0     13.0     14.0
     [5,]     11.0      8.5      8.5      8.0     10.5     10.5     10.0      8.0      7.5      7.5
     [6,]     10.5     17.5     16.0      9.0     13.0     11.5      4.5     18.5     11.5     10.0
     ...
    

    To answer your other points, you can restrict the columns used by using, for example dat[,c(2,3,6)] in the comb statements (to use columns 2, 3 and 6). You can add them back to the same dataframe with dat <- cbind(dat, combined)