Search code examples
pythonmachine-learningclassificationroc

How to plot a ROC curve for a Lasso Regression model in python


Lasso, although it's a regression algorithm, can be used as a classifier. Therefore, there should be a way to make a ROC curve and find it's AUC.

This is my code for making the model, scaling and standardizing it:
X = data.drop(['Response'], axis = 1)
Y = data.Response

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.25, random_state = 42)

from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler() 
scaled_X_train = scaler.fit_transform(X_train)
scaled_X_test = scaler.transform(X_test) 

pipline = Pipeline([
    ('scaler', StandardScaler()),
    ('model', Lasso(normalize=True))
])

lasso_model = GridSearchCV(pipline,
                           {'model__alpha': np.arange(0, 3, 0.05)},
                           cv = 10 ,
                           scoring = 'roc_auc',
                           verbose = 3,
                           n_jobs = -1,
                           error_score = 'raise')

lasso_model.fit(scaled_X_train, Y_train)

Now trying to make the ROC curves, one for the training set and one for the test set:

# Make a line for the random classification, AUC = 0.5: 
r_probs = [0 for _ in range(len(Y_test))]
r_auc = roc_auc_score(Y_test,r_probs)
r_fpr, r_tpr , _ = roc_curve(Y_test, r_probs)

y_pred_proba = lasso_model.predict_proba(scaled_X_train)[::,1]
fpr, tpr, _ = roc_curve(Y_train,  y_pred_proba)
auc = roc_auc_score(Y_train, y_pred_proba)

#create ROC curve

plt.plot(r_fpr, r_tpr, linestyle = '--', label = 'Random Prediction (AUROC = %0.3f)' %r_auc)
plt.plot(fpr,tpr,label="AUC="+str(auc))
plt.title('Train set ROC')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
plt.legend(loc=4)
plt.show()

I get the error that predict_proba does not exist for lasso regression, since it is not a classification algorithm. So how can I plot a ROC curve?


Solution

  • Lasso is basically linear model with L1 regularization. You can enable this kind of regularization for LogisticRegression() using penalty='l1' for the same effect (see this question, for example).

    Alternatively, you can try to duct tape a sigmoid function to your Lasso() output, but that'd be doing the same thing as above with much more effort.