Search code examples
pythonlasso-regression

getting negarive r squared with lasso regression, python


I did lasso regression but I got the negative R squared. Here my coding:

X = df.drop('var', axis=1)
y = df['var']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=10)
reg = Lasso(alpha=0.5)
reg.fit(X_train, y_train)
lambdas = (0.001, 0.01, 0.1, 0.5, 1, 2, 10)
l_num = 7
pred_num = X.shape[1]

# prepare data for enumerate

coeff_a = np.zeros((l_num, pred_num))
train_r_squared = np.zeros(l_num)
test_r_squared = np.zeros(l_num)

# enumerate through lambdas with index and i
for ind, i in enumerate(lambdas):    
    reg = Lasso(alpha = i)
    reg.fit(X_train, y_train)

    coeff_a[ind,:] = reg.coef_
    train_r_squared[ind] = reg.score(X_train, y_train)
    test_r_squared[ind] = reg.score(X_test, y_test)

When I print test_r_squared[ind] I am getting -0.8086 .

Why it is? Any help would be appreciated. Thanks.


Solution

  • It's not impossible to get an R^2 value less than 0.

    R^2 is a metric used to measure the performance of a regressor. The optimal score of R^2 metric is 1. If any regressor predicts a constant value, you should expect an R^2 score of 0 for that regressor. But unexpectedly, you can get worse performance than that. That's because:

    R^2 is given by:

    R^2 = 1-FVU
    

    Where, FVU (Fraction of Variance Unexplained) is the ratio of residual sum of squares and the variance of data. So, when your residual sum of squares is greater than the variance of the data, you should expect FVU>1, hence an R^2 value less than 0. It may happen when you choose a wrong model, or nonsensical parameters.

    In short, you may get an R^2 score less than 0. It means you have chosen wrong model or parameters.