I am trying to use ROC for evaluating my emotion text classifier model
This is my code for the ROC :
# ROC-AUC Curve
from sklearn.metrics import roc_curve, auc
import matplotlib.pyplot as plt
fpr, tpr, thresholds = roc_curve(y_test, y_test_hat2)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=1, label='ROC curve (area = %0.2f)' % roc_auc)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC CURVE')
plt.legend(loc="lower right")
plt.show()
This is the Error :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-30-ef4ee0eff994> in <module>()
2 from sklearn.metrics import roc_curve, auc
3 import matplotlib.pyplot as plt
----> 4 fpr, tpr, thresholds = roc_curve(y_test, y_test_hat2)
5 roc_auc = auc(fpr, tpr)
6 plt.figure()
1 frames
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_ranking.py in roc_curve(y_true, y_score, pos_label, sample_weight, drop_intermediate)
961 """
962 fps, tps, thresholds = _binary_clf_curve(
--> 963 y_true, y_score, pos_label=pos_label, sample_weight=sample_weight
964 )
965
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_ranking.py in _binary_clf_curve(y_true, y_score, pos_label, sample_weight)
729 y_type = type_of_target(y_true)
730 if not (y_type == "binary" or (y_type == "multiclass" and pos_label is not None)):
--> 731 raise ValueError("{0} format is not supported".format(y_type))
732
733 check_consistent_length(y_true, y_score, sample_weight)
ValueError: multiclass format is not supported
This is the y_test and y_test_hat2 :
y_test = data_test["label"]
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer()
test_vectors = vectorizer.transform(data_test['tweet'])
classifier_linear2 = LinearSVC(verbose=1)
y_test_hat2=classifier_linear2.predict(test_vectors)
Shape of test_vectors = (1096, 11330)
Shape of y_test = (1096,)
Label in y_test = ['0', '1', '2', '3', '4']
A ROC curve is based on soft predictions, i.e. it uses the predicted probability of an instance to belong to the positive class rather than the predicted class. For example with sklearn one can obtain the probabilities with predict_proba
instead of predict
(for the classifiers which provide it, example).
Note: OP used the tag multiclass-classification, but it's important to note that ROC curves can only be applied to binary classification problems.
One can find a short explanation of ROC curves here.