Search code examples
machine-learningkerasscikit-learnlinear-regression

Why my Linear Regession model gives me error when all of my inputs are integers


I want to try all regression algorithms on my dataset and choose a best. I decide to start from Linear Regression. But i get some error. I tried to do scaling but also get another error.

Here is my code:

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

train_df = pd.read_csv('train.csv', index_col='ID')
train_df.head()
target = 'Result'

X = train_df.drop(target, axis=1)
y = train_df[target]

# Trying to scale and get even worse error
#ss = StandardScaler()
#df_scaled = pd.DataFrame(ss.fit_transform(train_df),columns = train_df.columns)
#X = df_scaled.drop(target, axis=1)
#y = df_scaled[target]

model = LogisticRegression() 
model.fit(X, y) 

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=10000,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=None, solver='lbfgs', tol=0.0001, verbose=10,
                   warm_start=False)
                   

print(X.iloc[10])
print(model.predict([X.iloc[10]]))
print(y[10])

Here is an error:

ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
A     0
B   -19
C   -19
D   -19
E     0
F   -19
Name: 10, dtype: int64
[0]
-19

And here is an example of dataset:

ID,A,B,C,D,E,F,Result
0,-18,18,18,-2,-12,-3,-19
1,-19,-8,0,18,18,1,0
2,0,-11,18,0,-19,18,18
3,18,-15,-12,18,-11,-4,-17
4,-17,18,-11,-17,-18,-19,18
5,18,-14,-19,-14,-15,-19,18
6,18,-17,18,18,18,-2,-1
7,-1,-11,0,18,18,18,18
8,18,-19,-18,-19,-19,18,18
9,18,18,0,0,18,18,0
10,0,-19,-19,-19,0,-19,-19
11,-19,0,-19,18,-19,-19,-6
12,-6,18,0,0,0,18,-15
13,-15,-19,-6,-19,-19,0,0
14,0,-15,0,18,18,-19,18
15,18,-19,18,-8,18,-2,-4
16,-4,-4,18,-19,18,18,18
17,18,0,18,-4,-10,0,18
18,18,0,18,18,18,18,-19

What i do wrong?


Solution

  • You're using LogisticRegression, which is a special case of Linear Regression used for categorical dependent variables.

    This is not necessarily wrong, as you might intend to do so, but that means you need sufficient data per category and enough iterations for the model to converge (which your error points out, it hasn't done).

    I suspect, however, that what you intended to use is LinearRegression (used for continuous dependent variables) from sklearn library.