Search code examples
pythonmachine-learningscikit-learnfeature-extraction

Find all combinations of features


I need to transform my binary coded feature matrix into a matrix that consists of all possible combinations of feature interactions. By all I mean literally all combinations (every set of 2, every set of 3, every set of 4, every set of all, etc).

Anyone know if there is a way to do this with sklearn.preprocessing ? Or other libraries?

Input this array into some function or method:

array([[0, 1, 1],
       [1, 0, 0],
       [1, 1, 1]])

And get this as Output

array([[0, 0, 1, 0],
       [0, 0, 0, 0],
       [1, 1, 1, 1]])

Each row in the new matrix represents [x1*x2, x1*x3, x2*x3, x1*x2*x3]


Solution

  • What you want is know as the powerset. So you want to find the powerset of your features and then multiply the corresponding binary values, which is basically taking a np.bitwise_and. So here's how you could do this:

    • Obtain the powerset finding all combinations of the features up to length len(features)
    • Reduce with np.logical_and.reduce
    • Append to a list containing all sets in the powerset

    a = np.array([[0, 1, 1],
                  [1, 0, 0],
                  [1, 1, 1]])
    
    from itertools import chain, combinations
    
    features = a.T.tolist()
    power_set = []
    for comb in chain.from_iterable(combinations(features, r) 
                                   for r in range(2,len(features)+1)):
        power_set.append(np.logical_and.reduce(comb).view('i1').tolist())
    

    Which will give you:

    np.array(power_set).T
    
    array([[0, 0, 1, 0],
           [0, 0, 0, 0],
           [1, 1, 1, 1]])