Search code examples
rfeature-engineering

Compute combination of a pair variables for a given operation in R


From a given dataframe:

# Create dataframe with 4 variables and 10 obs
set.seed(1)
df<-data.frame(replicate(4,sample(0:1,10,rep=TRUE)))

I would like to compute a substract operation between in all columns combinations by pairs, but only keeping one substact, i.e column A- column B but not column B-column A and so on.

What I got is very manual, and this tend to be not so easy when there are lots of variables.

# Result
df_result <- as.data.frame(list(df$X1-df$X2,
df$X1-df$X3,
df$X1-df$X4,

df$X2-df$X3,
df$X2-df$X4,

df$X3-df$X4))

Also the colname of the feature name should describe the operation i.e.(x1_x2) being x1-x2.


Solution

  • You can use combn:

    COMBI = combn(colnames(df),2)
    res = data.frame(apply(COMBI,2,function(i)df[,i[1]]-df[,i[2]]))
    colnames(res) = apply(COMBI,2,paste0,collapse="minus")
    
    head(res)
      X1minusX2 X1minusX3 X1minusX4 X2minusX3 X2minusX4 X3minusX4
    1         0         0        -1         0        -1        -1
    2         1         1         0         0        -1        -1
    3         0         0         0         0         0         0
    4         0         0        -1         0        -1        -1
    5         1         1         1         0         0         0
    6        -1         0         0         1         1         0