Search code examples
pythonpandasdataframepreprocessorfeature-engineering

Add selected interactions as columns to pandas dataframe


I'm fairly new to pandas and python. I'm trying to return few selected interaction terms of all possible interactions in a data frame, and then return them as new features in the df.

My solution was to calculate the interactions of interest using sklearn's PolynomialFeature() and attach them to the df in a for loop. See example:

import numpy as np
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures

np.random.seed(1111)
a1 = np.random.randint(2, size = (5,3))
a2 = np.round(np.random.random((5,3)),2)

df = pd.DataFrame(np.concatenate([a1, a2], axis = 1), columns = ['a','b','c','d','e','f'])

combinations = [['a', 'e'], ['a', 'f'], ['b', 'f']]

for comb in combinations:
    polynomizer = PolynomialFeatures(interaction_only=True, include_bias=False).fit(df[comb])

    newcol_nam = polynomizer.get_feature_names(comb)[2]
    newcol_val = polynomizer.transform(df[comb])[:,2]

    df[newcol_nam] = newcol_val

df
    a       b       c       d       e       f       a e     a f     b f
0   0.0     1.0     1.0     0.51    0.45    0.10    0.00    0.00    0.10
1   1.0     0.0     0.0     0.67    0.36    0.23    0.36    0.23    0.00
2   0.0     0.0     0.0     0.97    0.79    0.02    0.00    0.00    0.00
3   0.0     1.0     0.0     0.44    0.37    0.52    0.00    0.00    0.52
4   0.0     0.0     0.0     0.16    0.02    0.94    0.00    0.00    0.00

Another solution would be to run

PolynomialFeatures(2, interaction_only=True, include_bias=False).fit_transform(df)

and then drop the interactions I'm not interested in. However, neither option is ideal in terms of performance and I'm wondering if there is a better solution.


Solution

  • As commented, you can try:

    df = df.join(pd.DataFrame({
        f'{x} {y}': df[x]*df[y] for x,y in combinations
    }))
    

    Or simply:

    for comb in combinations:
        df[' '.join(comb)] = df[comb].prod(1)
    

    Output:

         a    b    c     d     e     f   a e   a f   b f
    0  0.0  1.0  1.0  0.51  0.45  0.10  0.00  0.00  0.10
    1  1.0  0.0  0.0  0.67  0.36  0.23  0.36  0.23  0.00
    2  0.0  0.0  0.0  0.97  0.79  0.02  0.00  0.00  0.00
    3  0.0  1.0  0.0  0.44  0.37  0.52  0.00  0.00  0.52
    4  0.0  0.0  0.0  0.16  0.02  0.94  0.00  0.00  0.00