I am running a logistic regression, but I am obtaining an f1-score of 0.0. I think this has to do with zero-division error but I am unable to fix it
data4=data[['Age','BusinessTravel_Travel_Frequently','DistanceFromHome','Education','EnvironmentSatisfaction','Gender_Male','JobInvolvement','YearsWithCurrManager','MaritalStatus_Married','JobSatisfaction','NumCompaniesWorked','TotalWorkingYears','TrainingTimesLastYear','YearsAtCompany','Performance_dummy']]
X1=data4[['Age','BusinessTravel_Travel_Frequently','DistanceFromHome','Education','EnvironmentSatisfaction','Gender_Male','JobInvolvement','YearsWithCurrManager','MaritalStatus_Married','JobSatisfaction','NumCompaniesWorked','TotalWorkingYears','TrainingTimesLastYear','YearsAtCompany']]
y1=data4.Performance_dummy
# split X and y into training and testing sets
from sklearn.model_selection import train_test_split
X_train1,X_test1,y_train1,y_test1=train_test_split(X1,y1,test_size=0.5,random_state=0,stratify=y1)
# import the class
from sklearn.linear_model import LogisticRegression
# instantiate the model (using the default parameters)
logreg1 = LogisticRegression(max_iter=1000)
# fit the model with data
logreg1.fit(X_train1,y_train1)
#
y_pred1=logreg1.predict(X_test1)
print('Accuracy of logistic regression classifier on test set: {:.2f}'.format(logreg1.score(X_test1, y_test1)))
I got the following output
Accuracy of logistic regression classifier on test set: 0.85
I ran the confusion matrix code as shown below
from sklearn.metrics import confusion_matrix
confusion_matrix = confusion_matrix(y_test1, y_pred1)
print("Confusion Matrix:\n",confusion_matrix)
from sklearn.metrics import classification_report
print("Classification Report:\n",classification_report(y_test1, y_pred1,zero_division=1))
output for above code
Confusion Matrix:
[[622 0]
[113 0]]
Classification Report:
precision recall f1-score support
0 0.85 1.00 0.92 622
1 1.00 0.00 0.00 113
accuracy 0.85 735
macro avg 0.92 0.50 0.46 735
weighted avg 0.87 0.85 0.78 735
I also ran this code to understand the ratio of outcomes in my test data, and got the following output, but I am not sure how to fix this zero division error
from collections import Counter
print(Counter(y_train1))
print(Counter(y_test1))
output
Counter({0: 622, 1: 113})
Counter({0: 622, 1: 113})
Your f1-score
is ill-defined as your model is only predicting one class (0
).
You could use class_weight="balanced"
on your LogisticRegression
to penalize sample that are under represented.
If this not work it might be wise to augment the train set size or use a more advanced model.