Search code examples
pythonlinear-regressionrocauc

ROC curve from the linear regression model made of continuous values


I want make a model with continuous values. So, before I split a data.

enter image description here

X = data[col_list]
y = data['death rate']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0)

First, I made the model with 'sklearn.linear_model import LinearRegression'.

#instantiate the model
lin_regression = LinearRegression()

#fit the model using the training data
lin_regression.fit(X_train,y_train)

#define metrics
y_predicted = lin_regression.predict(X_test)
fpr, tpr, _ = metrics.roc_curve(y_test,  y_predicted)

But the code didn't work. It said 'ValueError: continuous format is not supported'.

After then I used 'from sklearn import svm' to handle it.

random_state = np.random.RandomState(0)

#instantiate the model
classifier = OneVsRestClassifier(
    svm.SVC(kernel="linear", probability=True, random_state=random_state)
)

#fit the model using the training data
y_score = classifier.fit(X_train, y_train).decision_function(X_test)

But it still didn't work with 'ValueError:Unknown label type'. I found that original y data format, whichis is from the site I referenced, is (n x 3)array and it's binary values. for example, y_train=[[0,1,1],[0,1,0],...].

My question is

  1. Can Linear regression model has ROC curve?
  2. If it can, how to make it in python?

Solution

  • You can't compute a ROC curve from a regression model since you can't define true positives, true negatives, false positives and false negatives. The only solution could be to define a threshold and to binarize the y variable as:

    y_bin = np.zeros_like(y_test)
    y_bin[y_test>=threshold] = 1
    
    fpr, tpr, _ = metrics.roc_curve(y_bin,  y_predicted)
    

    Otherwise you can apply others metrics as made in: https://www.sciencedirect.com/science/article/pii/S0031320313002665