Search code examples
pythonmachine-learningscikit-learnlogistic-regression

Machine learning not predicting correct results


I am working on creating a simple Python machine learning script which will predict if loan will be approved or not based on below parameters

business experience: should be greater than 7
year of founded: should be after 2015
loan: no previous or current loan

If above conditions matches, then only loan will be approved. This dataset can be downloaded from this link:

https://drive.google.com/file/d/1QtJ3EED7KDqJDrSHxHB6g9kc5YAfTlmF/view?usp=sharing

For above data, I have below script

from sklearn.linear_model import LogisticRegression
import pandas as pd
import numpy as np

data = pd.read_csv("test2.csv")
data.head()

X = data[["Business Exp", "Year of Founded", "Previous/Current Loan"]]
Y = data["OUTPUT"]

clf = LogisticRegression()
clf.fit(X, Y)

test_x2 = np.array([[9, 2017, 0]])
Y_pred = clf.predict(test_x2)
print(Y_pred)

I am passing the test data in test_x2. Test data is if business exp is 9, year of founded is 2017 and no current/previous loan, so that means loan will be provided. So it should predict and the result should be 1 but it shows 0. Is there any issue with the code or with the dataset. As I am beginner in machine learning and still learning it so I have created this custom dataset for my own understanding.


Solution

  • You should use StandardScaler() within a pipeline

    from sklearn.linear_model import LogisticRegression
    from sklearn.preprocessing import StandardScaler
    from sklearn.pipeline import make_pipeline
    import pandas as pd
    import numpy as np
    
    data = pd.read_csv("test2.csv")
    data.head()
    
    X = data[["Business Exp", "Year of Founded", "Previous/Current Loan"]]
    Y = data["OUTPUT"]
    
    clf = make_pipeline(StandardScaler(), LogisticRegression())
    clf.fit(X, Y)
    
    test_x2 = np.array([[9, 2017, 0]])
    Y_pred = clf.predict(test_x2)
    
    print("prediction = ", Y_pred.item())
    prediction =  1
    print("score = ", clf.score(X, Y))
    score =  0.95535