My point of departure is the whigs data from the ggraph package. It contains an incidence matrix.
Now, for each combination of columns/variables, I'd like to know if all the columns are 1 or not, and create a new column for that combination with a 1 if indeed all the columns are 1 or a 0 if not.
The whigs data is just an example: I'm looking for a vectorized method that can be used regardless of the number of columns/combinations.
Using dplyr, I can use across()
in the mutate()
function to create multiple new columns, but I can't figure out how to create those columns on the basis of the various combinations of columns.
Also using dplyr, I can use c_across()
in the mutate()
function, in tandem with the rowwise()
function, to create a single new column based on the values in multiple columns.
Maybe these two can be combined somehow?
You could try
library(dplyr)
df <- data.frame(A = rep(0, 4),
B = c(1, 0, 0, 1),
C = c(0, 1, 1, 0),
D = c(0, 1, 1 ,1))
cols <- 1:ncol(df)
combs <- unlist(sapply(cols[-1], function(x) {
asplit(combn(cols, m = x), 2)
}), recursive = FALSE)
lapply(combs, function(x) {
df <<- df %>% mutate(!!paste0(x, collapse = "/") := as.numeric(rowSums(df[, x]) == length(x)))
})
We create all combinations of columns by index and apply on each combination a function, that checks if all values of these columns are equal to 1 by checking the row sum. If so, we add a new column named "x/y/z..." where x, y and z are the indices of colums compared that is equal to 1 and else 0. Careful, this is quite expensive when the number of columns grows.
A B C D 1/2 1/3 1/4 2/3 2/4 3/4 1/2/3 1/2/4 1/3/4 2/3/4 1/2/3/4
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0
3 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0
4 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0