Search code examples
rdplyracross

How to mutate new columns across all combinations of other columns?


My point of departure is the whigs data from the ggraph package. It contains an incidence matrix.

Now, for each combination of columns/variables, I'd like to know if all the columns are 1 or not, and create a new column for that combination with a 1 if indeed all the columns are 1 or a 0 if not.

The whigs data is just an example: I'm looking for a vectorized method that can be used regardless of the number of columns/combinations.

Using dplyr, I can use across() in the mutate() function to create multiple new columns, but I can't figure out how to create those columns on the basis of the various combinations of columns.

Also using dplyr, I can use c_across() in the mutate() function, in tandem with the rowwise() function, to create a single new column based on the values in multiple columns.

Maybe these two can be combined somehow?


Solution

  • You could try

    library(dplyr)
    df <- data.frame(A = rep(0, 4), 
                     B = c(1, 0, 0, 1), 
                     C = c(0, 1, 1, 0), 
                     D = c(0, 1, 1 ,1))
    cols  <- 1:ncol(df)
    
    combs  <- unlist(sapply(cols[-1], function(x) {
      asplit(combn(cols, m = x), 2)
    }), recursive = FALSE)
    
    lapply(combs, function(x) {
      df <<- df %>% mutate(!!paste0(x, collapse = "/") := as.numeric(rowSums(df[, x]) == length(x))) 
    })
    

    We create all combinations of columns by index and apply on each combination a function, that checks if all values of these columns are equal to 1 by checking the row sum. If so, we add a new column named "x/y/z..." where x, y and z are the indices of colums compared that is equal to 1 and else 0. Careful, this is quite expensive when the number of columns grows.

      A B C D 1/2 1/3 1/4 2/3 2/4 3/4 1/2/3 1/2/4 1/3/4 2/3/4 1/2/3/4
    1 0 1 0 0   0   0   0   0   0   0     0     0     0     0       0
    2 0 0 1 1   0   0   0   0   0   1     0     0     0     0       0
    3 0 0 1 1   0   0   0   0   0   1     0     0     0     0       0
    4 0 1 0 1   0   0   0   0   1   0     0     0     0     0       0