Search code examples
pythonpandasnumpyregressionpolynomials

Polynomial Expansion from scratch with numpy/python


I am building a polynomial regression without using Sklearn. I'm having trouble with Polynomial Expansion of features right now.

I have a dataframe with columns A and B. When I imported and ran PolynomialFeatures(degree of 2) from Sklearn, I found that it returns 6 different features.

I understand that 2 features became 6 features because it is (A + B + Constant)*(A + B + Constant)

which becomes A2 + 2AB + 2AC + 2BC + B2 + C2, 6 different features. I am trying to recapitulate this with Python and Numpy.

As there is constant c, I created a new column C to my dataframe. However, I am very stuck on how to proceed after this. I tried for loop for (number of features * degree #) times but got confused for the combination of features.

'''

    def polynomial_expansion(features_df, order):

        return expanded_df

'''

Can someone help me out? What would be Python/Numpy/Pandas method I can use for this situation? Thank you.


Solution

  • I created a simple example of what you need to do in order to create your polynomial features from scratch. The first part of the code creates the result from Scikit Learn:

    from sklearn.preprocessing import PolynomialFeatures
    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame.from_dict({
        'x': [2],
        'y': [5],
        'z': [6]})
    
    p = PolynomialFeatures(degree=2).fit(df)
    f = pd.DataFrame(p.transform(df), columns=p.get_feature_names(df.columns))
    print('deg 2\n', f)
    p = PolynomialFeatures(degree=3).fit(df)
    f = pd.DataFrame(p.transform(df), columns=p.get_feature_names(df.columns))
    print('deg 3\n', f)
    
    

    The result looks like:

    deg 2
          1    x    y    z  x^2   x y   x z   y^2   y z   z^2
    0  1.0  2.0  5.0  6.0  4.0  10.0  12.0  25.0  30.0  36.0
    deg 3
          1    x    y    z  x^2   x y   x z   y^2   y z   z^2  x^3  x^2 y  x^2 z  x y^2  x y z  x z^2    y^3  y^2 z  y z^2    z^3
    0  1.0  2.0  5.0  6.0  4.0  10.0  12.0  25.0  30.0  36.0  8.0   20.0   24.0   50.0   60.0   72.0  125.0  150.0  180.0  216.0
    

    Now to create a similar feature without Scikit Learn, we can write our code like this:

    
    row = [2, 5, 6]
    
    #deg = 1
    result = [1]
    result.extend(row)
    
    #deg = 2
    for i in range(len(row)):
        for j in range(len(row)):
            res=row[i]*row[j]
            if res not in result:
                result.append(res)
    print("deg 2", result)
    
    #deg = 3
    for i in range(len(row)):
        for j in range(len(row)):
                for z in range(len(row)):
                    res=row[i]*row[j]*row[z]
                    if res not in result:
                        result.append(res)
    print("deg 3", result)
    

    The result looks like:

    deg 2 [1, 2, 5, 6, 4, 10, 12, 25, 30, 36]
    deg 3 [1, 2, 5, 6, 4, 10, 12, 25, 30, 36, 8, 20, 24, 50, 60, 72, 125, 150, 180, 216]
    

    To get the same results recursively, you can use the following code:

    row = [2, 5, 6]
    def poly_feats(input_values, degree):
        if degree==1:
            if 1 not in input_values:
                result = input_values.insert(0,1)
            result=input_values
            return result
        elif degree > 1:
            new_result=[]
            result = poly_feats(input_values, degree-1)
            new_result.extend(result)
            for item in input_values:
                for p_item in result:
                    res=item*p_item
                    if (res not in result) and (res not in new_result):
                        new_result.append(res)
            return new_result
    
    print('deg 2', poly_feats(row, 2))
    print('deg 3', poly_feats(row, 3))
    

    And the results will be:

    deg 2 [1, 2, 5, 6, 4, 10, 12, 25, 30, 36]
    deg 3 [1, 2, 5, 6, 4, 10, 12, 25, 30, 36, 8, 20, 24, 50, 60, 72, 125, 150, 180, 216]
    

    Also, if you need to use Pandas data frame as an input to the function, you can use the following:

    def get_poly_feats(df, degree):
        result = {}
        for index, row in df.iterrows():
            result[index] = poly_feats(row.tolist(), degree)
        return result