Search code examples
rdplyrtidyverse

pairwise combinations of large number of variables using mutate()


I have a very large number of variables in a dataframe that represent binary outcomes (0, 1). I am trying to create a new dataframe that makes contrasts of each of the variables using an ifelse() condition.

Here is a minimal example to replicate what I am trying achieve.

# initial dataframe
set.seed(123)
df1 <- data.frame(sf1 = sample(0:1,10, replace=T), sf2 = sample(0:1,10, replace=T), sf3 = sample(0:1,10, replace=T))

# get all pairwise combinations of col names
two_way <- combn(colnames(df1), 2, FUN=paste, collapse='&')

# create contrats of col names
dfs<-paste('ifelse(',two_way,',1,0)', sep='') 

dfs
[1] "ifelse(sf1&sf2,1,0)" "ifelse(sf1&sf3,1,0)" "ifelse(sf2&sf3,1,0)"

This creates a character vector of the desired ifelse() conditions. What I now want to do is get those conditions into a mutate function to create a new dataframe with all my contracts. Something like this

df2 <- df1 %>%
        mutate(
            data.frame(
               ifelse(sf1&sf2,1,0),
               ifelse(sf1&sf3,1,0),
               ifelse(sf2&sf3,1,0),
               check.names = FALSE
            )
        )

df2
   sf1 sf2 sf3 ifelse(sf1 & sf2, 1, 0) ifelse(sf1 & sf3, 1, 0) ifelse(sf2 & sf3, 1, 0)
1    0   1   0                       0                       0                       0
2    0   0   0                       0                       0                       0
3    0   1   0                       0                       0                       0
4    0   0   1                       0                       0                       0
5    1   0   1                       0                       1                       0
6    1   1   1                       1                       1                       1
7    1   1   1                       1                       1                       1
8    0   1   1                       0                       0                       1
9    0   1   0                       0                       0                       0
10   1   1   0                       1                       0                       0 

How can I pass the vector of character contrasts dfs to the mutate function? Is this possible or is there a better way of achieving the desired outcome?


Solution

  • ifelse(sf1&sf2,1,0) can also be written as as.integer(sf1&sf2).

    I'll do this in base R using combn. Use combn to create combination of column names, for each combination subset the data from the original dataset and return a named list of 1/0 integers which is appended to original dataset using cbind.

    cbind(df1,combn(names(df1), 2, \(x) {
      setNames(
        list(as.integer(df1[[x[1]]] & df1[[x[2]]])), 
       paste0(x, collapse = "_")
      )
    }, simplify = FALSE))
    
    #   sf1 sf2 sf3 sf1_sf2 sf1_sf3 sf2_sf3
    #1    0   1   0       0       0       0
    #2    0   1   1       0       0       1
    #3    0   1   0       0       0       0
    #4    1   0   0       0       0       0
    #5    0   1   0       0       0       0
    #6    1   0   0       0       0       0
    #7    1   1   1       1       1       1
    #8    1   0   1       0       1       0
    #9    0   0   0       0       0       0
    #10   0   0   1       0       0       0