Search code examples
pythonmatplotlibscikit-learnlogistic-regression

Plotting prediction from logistic regression


I would like to plot y_test and prediction in a scatter plot. I am using the logistic regression as model.

from sklearn.linear_model import LogisticRegression

vectorizer = CountVectorizer()

X = vectorizer.fit_transform(df['Spam'])
y = df['Label']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=27)

lr = LogisticRegression(solver='liblinear').fit(X_train, y_train)
pred_log = lr.predict(X_test)

I have tried as follows

## Plot the model

plt.scatter(y_test, pred_log)
plt.xlabel("True Values")
plt.ylabel("Predictions")

and I got this:

enter image description here

that I do not think it is what I should expect. y_test is (250,), similarly pred_log is (250,)

Am I considering the wrong variables to plot, or they are right? I have no idea one what the plot with those four values mean. I would have been expected more dots in the plot, but maybe I am wrong.

Please let me know if you need more info. Thanks


Solution

  • I think you know LogisticRegression is a classification algorithm. If you do binary classification it will predict whether predicted class is 0 or 1.If you want to get visualization about how model preform, you should consider confusion matrix.You can't use scatterplot for visualize classification results.

    import seaborn as sns
    cm = confusion_matrix(y_true, y_pred)
    sns.heatmap(cf_matrix, annot=True)
    

    confusion matrix shows how many labels have correct predictions and how many are wrong.Looking at confusion matrix you can calculate how accurate the model.We can use different metrices like precision,recall and F1 score.