I'm currently learning Logistic Regression and I have some difficulties.
this is my code: I import the libraries:
import numpy as np
from sklearn.linear_model import LogisticRegression as lr
import matplotlib.pyplot as plt
I set up the data(a very simple one):
#first column is the number of cigarettes somebody smokes, and the
second column is the fact that they cancer or not(0 meaning no and 1
meaning yes).
data = np.array([[0, "0"],
[10, "0"],
[60, "1"],
[90, "1"]])
Now I make the model with a one-liner:
model = lr().fit(X=data[:,0].reshape(len(data),1),y=data[:,1])
then I make some predictions :
pred = model.predict([[4],[75],[14],[55]])
Now, here are my difficulties:
first, how can I plot this model using the matplotlib library?
second, if I use:
pred = model.predict_proba([[4],[75],[14],[55]])
I will the get the probabilites, right? but why the probabilities are like this?:
array([[9.98960882e-01, 1.03911777e-03],
[1.59627706e-04, 9.99840372e-01],
[9.90711371e-01, 9.28862908e-03],
[1.28043403e-02, 9.87195660e-01]])
shouldn't it be between 0 and 1? why is it either close to 1 and 9? also, what is that e-01,e-04 etc......?I have tried to predict for 4 numbers, but why do I get 8 predictions ?
sorry If I ask too many questions. I'm just curious.
When calling the predict_proba method from a sklearn model, you basically ask the model : What are the off the probas for one input to belong to the first class, the second class, ...., the last class ?
In your case, you have 2 classes ("0" and "1") Let's take this line :
pred = model.predict_proba([[4]])
Your output is
array([[9.98960882e-01, 1.03911777e-03]])
Its means that yout input (4) have the probability 9.98960882e-01 to belong to the first class ("0" in your case) and the probability 1.03911777e-03 to belong to the second class ("1" in your case).
The e-N stands for *10 to the power of -N so:
It you want a clear prediction, you should use the method predict(inputs) as you did before.
To plot that, you should first convert your labels to integers and perform a classic plot as x= some inputs you want to predict and y=the predictions.
You should check out this : https://matplotlib.org/stable/tutorials/introductory/pyplot.html