I need to transform my binary coded feature matrix into a matrix that consists of all possible combinations of feature interactions. By all I mean literally all combinations (every set of 2, every set of 3, every set of 4, every set of all, etc).
Anyone know if there is a way to do this with sklearn.preprocessing ? Or other libraries?
Input this array into some function or method:
array([[0, 1, 1],
[1, 0, 0],
[1, 1, 1]])
And get this as Output
array([[0, 0, 1, 0],
[0, 0, 0, 0],
[1, 1, 1, 1]])
Each row in the new matrix represents [x1*x2, x1*x3, x2*x3, x1*x2*x3]
What you want is know as the powerset. So you want to find the powerset of your features and then multiply the corresponding binary values, which is basically taking a np.bitwise_and
. So here's how you could do this:
len(features)
np.logical_and.reduce
sets
in the powerseta = np.array([[0, 1, 1],
[1, 0, 0],
[1, 1, 1]])
from itertools import chain, combinations
features = a.T.tolist()
power_set = []
for comb in chain.from_iterable(combinations(features, r)
for r in range(2,len(features)+1)):
power_set.append(np.logical_and.reduce(comb).view('i1').tolist())
Which will give you:
np.array(power_set).T
array([[0, 0, 1, 0],
[0, 0, 0, 0],
[1, 1, 1, 1]])