python scikit-learn linear-regression preprocessor polynomial-approximations

Should I use feature scaling with polynomial regression with scikit-learn?

I have been playing around with lasso regression on polynomial functions using the code below. The question I have is should I be doing feature scaling as part of the lasso regression (when attempting to fit a polynomial function). The R^2 results and plot as outlined in the code I have pasted below suggests not. Appreciate any advice on why this is not the case or if I have fundamentally stuffed something up. Thanks in advance for any advice.

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split


np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10


X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)

def answer_regression():
    from sklearn.preprocessing import PolynomialFeatures
    from sklearn.linear_model import Lasso, LinearRegression
    from sklearn.metrics.regression import r2_score
    from sklearn.preprocessing import MinMaxScaler
    import matplotlib.pyplot as plt
    scaler = MinMaxScaler()
    global X_train, X_test, y_train, y_test

    degrees = 12
    poly = PolynomialFeatures(degree=degrees)
    X_train_poly = poly.fit_transform(X_train.reshape(-1,1))
    X_test_poly = poly.fit_transform(X_test.reshape(-1,1))

    #Lasso Regression Model
    X_train_scaled = scaler.fit_transform(X_train_poly)
    X_test_scaled = scaler.transform(X_test_poly)

    #No feature scaling
    linlasso = Lasso(alpha=0.01, max_iter = 10000).fit(X_train_poly, y_train)
    y_test_lassopredict = linlasso.predict(X_test_poly)
    Lasso_R2_test_score = r2_score(y_test, y_test_lassopredict)

    #With feature scaling
    linlasso = Lasso(alpha=0.01, max_iter = 10000).fit(X_train_scaled, y_train)
    y_test_lassopredict_scaled = linlasso.predict(X_test_scaled)
    Lasso_R2_test_score_scaled = r2_score(y_test, y_test_lassopredict_scaled)

    %matplotlib notebook
    plt.figure()

    plt.scatter(X_test, y_test, label='Test data')
    plt.scatter(X_test, y_test_lassopredict, label='Predict data - No Scaling')
    plt.scatter(X_test, y_test_lassopredict_scaled, label='Predict data - With Scaling')

    return (Lasso_R2_test_score, Lasso_R2_test_score_scaled)

answer_regression()```

Solution

Your X range is around [0,10], so the polynomial features will have a much wider range. Without scaling, their weights are already small (because of their larger values), so Lasso will not need to set them to zero. If you scale them, their weights will be much larger, and Lasso will set most of them to zero. That's why it has a poor prediction for the scaled case (those features are needed to capture the true trend of y).

You can confirm this by getting the weights (linlasso.coef_) for both cases, where you will see that most of the weights for the second case (scaled one) are set to zero.

It seems your alpha is larger than an optimal value and should be tuned. If you decrease alpha, you will get similar results for both cases.