Search code examples
pythonpandasfor-loopoperatorspython-itertools

itertools: Getting combinations of operations ( + - * / ) and columns


Given a data frame of numeric values, I would like to perform plus, minus, multiply & divide on all combinations of columns.

What would be the fastest approach to do this for combinations of 3 and above?

A minimal reproducible example is given below with combinations of 2.

import numpy as np
import pandas as pd
from itertools import combinations
from itertools import permutations
from sklearn.datasets import load_boston 

# the dataset
X, y = load_boston(return_X_y=True)
X = pd.DataFrame(X)

combos2 = list(combinations(X.columns,2))
perm3 = list(permutations(X.columns,3))  # how would i do this with out typing out all the permutations
for i in combos2:
    X[f'{i[0]}_X_{i[1]}'] = X.iloc[:,i[0]]*X.iloc[:,i[1]]  # Multiply
    X[f'{i[0]}_+_{i[1]}'] = X.iloc[:,i[0]]+X.iloc[:,i[1]]  # Add
    X[f'{i[0]}_-_{i[1]}'] = X.iloc[:,i[0]]-X.iloc[:,i[1]]  # Subtract
    X[f'{i[0]}_/_{i[1]}'] = X.iloc[:,i[0]]/(X.iloc[:,i[1]]+1e-20)   # Divide

I was thinking of a way to add the "operators + * - / into the combinations so it can be written in fewer lines than manually typing out all the combinations, but I don't know where to begin?

I would like all orders: i.e (a * b + c) , (a * b - c) , (a * b / c) etc

Ideally leaving no duplicate columns. i.e (a + b + c) and (c + b + a)

For example if I had 3 columns a b c. I want a new column (a * b + c).


Solution

  • Here's a naive solution that outputs the combinations of 2 & 3 of all the columns.

    1. List of combinations
    2. Using the operator package make a function
    3. for loop the combinations
    4. this may have duplicate columns hence, duplicates are deleted
    from sklearn.datasets import load_boston 
    from itertools import combinations
    import operator as op 
    
    X, y = load_boston(return_X_y=True)
    X =  pd.DataFrame(X)
    
    comb= list(combinations(X.columns,3))
    
    def operations(x,a,b):
       if (x == '+'): 
          d =  op.add(a,b) 
       if (x == '-'): 
          d =  op.sub(a,b) 
       if (x == '*'): 
          d =  op.mul(a,b)     
       if (x == '/'): # divide by 0 error
          d =  op.truediv(a,(b + 1e-20)) 
       return d
    
    
    for x in ['*','/','+','-']:
      for y in ['*','/','+','-']:
        for i in comb:
          a = X.iloc[:,i[0]].values
          b = X.iloc[:,i[1]].values
          c = X.iloc[:,i[2]].values
          d = operations(x,a,b)
          e = operations(y,d,c)
          X[f'{i[0]}{x}{i[1]}{y}{i[2]}'] = e
          X[f'{i[0]}{x}{i[1]}'] = d
    
    X = X.loc[:,~X.columns.duplicated()]