Am using basic LogisticRegression
on data for which the target variable is multiclass
.
I was expecting LogisticRegression
to give some error when the fit()
was called. But it didnt.
Does LogisticRegression
handle such case by default? If yes, what transformations are applied to the target variable?
ddf = pd.DataFrame(
[[1,2,3,4, "Blue"],
[4,2,3,4, "Red"],
[5,2,8,4, "Red"],
[2,7,3,9, "Green"],
[7,6,7,4, "Blue"]], columns=['A','B','C','D','E']
)
ddf
X = ddf[['A', 'B', 'C', 'D']]
y = ddf['E']
lr = LogisticRegression()
lr.fit(X, y)
preds = lr.predict(X)
print(preds)
Gives the output: ['Blue' 'Red' 'Red' 'Green' 'Blue']
Scikit-learn is able to handle string labels for all the classifiers by default, internally it creates a LabelEncoder object, have a look at the code here. String-class labels are encoded to integer values.