Search code examples
plotprobabilitylogistic-regressionpredictionpredict

Having difficulty with understanding logistic regression


I'm currently learning Logistic Regression and I have some difficulties.

this is my code: I import the libraries:

import numpy as np
from sklearn.linear_model import LogisticRegression as lr
import matplotlib.pyplot as plt

I set up the data(a very simple one):

#first column is the number of cigarettes somebody smokes, and the 
second column is the fact that they cancer or not(0 meaning no and 1 
meaning yes).
data = np.array([[0, "0"],
                 [10, "0"],
                 [60, "1"],
                 [90, "1"]])

Now I make the model with a one-liner:

model = lr().fit(X=data[:,0].reshape(len(data),1),y=data[:,1])

then I make some predictions :

pred = model.predict([[4],[75],[14],[55]])

Now, here are my difficulties:

first, how can I plot this model using the matplotlib library?

second, if I use:

pred = model.predict_proba([[4],[75],[14],[55]])

I will the get the probabilites, right? but why the probabilities are like this?:

array([[9.98960882e-01, 1.03911777e-03],
       [1.59627706e-04, 9.99840372e-01],
       [9.90711371e-01, 9.28862908e-03],
       [1.28043403e-02, 9.87195660e-01]])

shouldn't it be between 0 and 1? why is it either close to 1 and 9? also, what is that e-01,e-04 etc......?I have tried to predict for 4 numbers, but why do I get 8 predictions ?

sorry If I ask too many questions. I'm just curious.


Solution

  • When calling the predict_proba method from a sklearn model, you basically ask the model : What are the off the probas for one input to belong to the first class, the second class, ...., the last class ?

    In your case, you have 2 classes ("0" and "1") Let's take this line :

    pred = model.predict_proba([[4]])
    

    Your output is

    array([[9.98960882e-01, 1.03911777e-03]])
    

    Its means that yout input (4) have the probability 9.98960882e-01 to belong to the first class ("0" in your case) and the probability 1.03911777e-03 to belong to the second class ("1" in your case).

    The e-N stands for *10 to the power of -N so:

    • 9.98960882e-01 = 0.998960882
    • 1.03911777e-03 = 0.00103911777

    It you want a clear prediction, you should use the method predict(inputs) as you did before.

    To plot that, you should first convert your labels to integers and perform a classic plot as x= some inputs you want to predict and y=the predictions.

    You should check out this : https://matplotlib.org/stable/tutorials/introductory/pyplot.html