Search code examples
pythonmachine-learningscikit-learnlasso-regression

How to build a predict function with Lasso and RobustScalar?


I'm trying to figure out how to predict values with LASSO regression without using the .predict function that Sklearn provides. This is basically just to broaden my understanding of how LASSO works internally. I asked a question on Cross Validated about how LASSO regression works, and one of the comments mentioned how the predict function works the same as in Linear Regression. Because of this, I wanted to try and make my own function to do this.

I was able to successfully recreate the predict function in simpler examples, but when I try to use it in conjunction with RobustScaler, I keep getting different outputs. With this example, I'm getting the prediction as 4.33 with Sklearn, and 6.18 with my own function. What am I missing here? Am I not inverse transforming the prediction correctly at the end?

import pandas as pd
from sklearn.preprocessing import RobustScaler
from sklearn.linear_model import Lasso
import numpy as np

df = pd.DataFrame({'Y':[5, -10, 10, .5, 2.5, 15], 'X1':[1., -2.,  2., .1, .5, 3], 'X2':[1, 1, 2, 1, 1, 1], 
              'X3':[6, 6, 6, 5, 6, 4], 'X4':[6, 5, 4, 3, 2, 1]})

X = df[['X1','X2','X3','X4']]
y = df[['Y']]

#Scaling 
transformer_x = RobustScaler().fit(X)
transformer_y = RobustScaler().fit(y) 
X_scal = transformer_x.transform(X)
y_scal = transformer_y.transform(y)

#LASSO
lasso = Lasso()
lasso = lasso.fit(X_scal, y_scal)

#LASSO info
print('Score: ', lasso.score(X_scal,y_scal))
print('Raw Intercept: ', lasso.intercept_.round(2)[0]) 
intercept = transformer_y.inverse_transform([lasso.intercept_])[0][0]
print('Unscaled Intercept: ', intercept) 
print('\nCoefficients Used: ')
coeff_array = lasso.coef_
inverse_coeff_array = transformer_x.inverse_transform(lasso.coef_.reshape(1,-1))[0]
for i,j,k in zip(X.columns, coeff_array, inverse_coeff_array):
    if j != 0:
        print(i, j.round(2), k.round(2))

#Predictions
example = [[3,1,1,1]]
pred = lasso.predict(example)
pred_scal = transformer_y.inverse_transform(pred.reshape(-1, 1))
print('\nRaw Prediction where X1 = 3: ', pred[0])
print('Unscaled Prediction where X1 = 3: ', pred_scal[0][0])

#Predictions without using the .predict function 
def lasso_predict_value_(X1,X2,X3,X4): 
    print('intercept: ', intercept)
    print('coef: ', inverse_coeff_array[0])
    print('X1: ', X1)
    preds = intercept + inverse_coeff_array[0]*X1
    print('Your predicted value is: ', preds)

lasso_predict_value_(3,1,1,1)

Solution

  • The trained Lasso does not have any information whether the given datapoint is scalled or not. Hence your manual method to do the predict should not take the scalling aspect of it.

    If I remove your processing on the model co-efficients, we can get the result of sklearn model

    
    example = [[3,1,1,1]]
    lasso.predict(example)
    
    # array([0.07533937])
    
    
    #Predictions without using the .predict function 
    def lasso_predict_value_(X1,X2,X3,X4): 
        x_test = np.array([X1,X2, X3, X4])
        preds = lasso.intercept_ + sum(x_test*lasso.coef_)
        print('Your predicted value is: ', preds)
    
    
    lasso_predict_value_(3,1,1,1)
    
    # Your predicted value is:  [0.07533937]
    
    

    Update 2:

    Once I use LASSO, I then need to see what my predictions were in their original units. My dependent variable is in dollar amounts, and if I don't inverse transform it back, I'm unable to see how many dollars I need for the prediction.

    This is a very valid scenario. You need to apply the transformer_y.inverse_transform to get your unscalled dollar amount value. There is no need for disturbing the model weights.

    Updated example

    example = [[3,1,1,1]]
    scaled_pred = lasso.predict(transformer_x.transform(example))
    transformer_y.inverse_transform([scaled_pred])
    # array([[4.07460407]])
    
    #Predictions without using the .predict function 
    def lasso_predict_value_(X1,X2,X3,X4): 
        x_test = transformer_x.transform(np.array([X1,X2, X3, X4]).reshape(1,-1))[0]
        preds = lasso.intercept_ + sum(x_test*lasso.coef_)
        print('Your predicted value is: ', preds)
        print('Your unscaled predicted value is: ', 
              transformer_y.inverse_transform([scaled_pred]))
    
    
    lasso_predict_value_(3,1,1,1)
    # Your predicted value is:  [0.0418844]    
    # Your unscaled predicted value is:  [[4.07460407]]