python machine-learning scikit-learn feature-extraction

Find all combinations of features

I need to transform my binary coded feature matrix into a matrix that consists of all possible combinations of feature interactions. By all I mean literally all combinations (every set of 2, every set of 3, every set of 4, every set of all, etc).

Anyone know if there is a way to do this with sklearn.preprocessing ? Or other libraries?

Input this array into some function or method:

array([[0, 1, 1],
       [1, 0, 0],
       [1, 1, 1]])

And get this as Output

array([[0, 0, 1, 0],
       [0, 0, 0, 0],
       [1, 1, 1, 1]])

Each row in the new matrix represents [x1*x2, x1*x3, x2*x3, x1*x2*x3]

Solution

What you want is know as the powerset. So you want to find the powerset of your features and then multiply the corresponding binary values, which is basically taking a np.bitwise_and. So here's how you could do this:

Obtain the powerset finding all combinations of the features up to length len(features)
Reduce with np.logical_and.reduce
Append to a list containing all sets in the powerset

a = np.array([[0, 1, 1],
              [1, 0, 0],
              [1, 1, 1]])

from itertools import chain, combinations

features = a.T.tolist()
power_set = []
for comb in chain.from_iterable(combinations(features, r) 
                               for r in range(2,len(features)+1)):
    power_set.append(np.logical_and.reduce(comb).view('i1').tolist())

Which will give you:

np.array(power_set).T

array([[0, 0, 1, 0],
       [0, 0, 0, 0],
       [1, 1, 1, 1]])