I want make a model with continuous values. So, before I split a data.
X = data[col_list]
y = data['death rate']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3,random_state=0)
First, I made the model with 'sklearn.linear_model import LinearRegression'.
#instantiate the model
lin_regression = LinearRegression()
#fit the model using the training data
lin_regression.fit(X_train,y_train)
#define metrics
y_predicted = lin_regression.predict(X_test)
fpr, tpr, _ = metrics.roc_curve(y_test, y_predicted)
But the code didn't work. It said 'ValueError: continuous format is not supported'.
After then I used 'from sklearn import svm' to handle it.
random_state = np.random.RandomState(0)
#instantiate the model
classifier = OneVsRestClassifier(
svm.SVC(kernel="linear", probability=True, random_state=random_state)
)
#fit the model using the training data
y_score = classifier.fit(X_train, y_train).decision_function(X_test)
But it still didn't work with 'ValueError:Unknown label type'. I found that original y data format, whichis is from the site I referenced, is (n x 3)array and it's binary values. for example, y_train=[[0,1,1],[0,1,0],...].
My question is
You can't compute a ROC curve from a regression model since you can't define true positives, true negatives, false positives and false negatives. The only solution could be to define a threshold and to binarize the y variable as:
y_bin = np.zeros_like(y_test)
y_bin[y_test>=threshold] = 1
fpr, tpr, _ = metrics.roc_curve(y_bin, y_predicted)
Otherwise you can apply others metrics as made in: https://www.sciencedirect.com/science/article/pii/S0031320313002665