Search code examples
pythonpandasnumpyscikit-learnlogistic-regression

ValueError: Unknown label type: 'unknown'


I try to run following code.

import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression


# data import and preparation
trainData = pd.read_csv('train.csv')
train = trainData.values
testData = pd.read_csv('test.csv')
test = testData.values
X = np.c_[train[:, 0], train[:, 2], train[:, 6:7],  train[:, 9]]
X = np.nan_to_num(X)
y = train[:, 1]
Xtest = np.c_[test[:, 0:1], test[:, 5:6],  test[:, 8]]
Xtest = np.nan_to_num(Xtest)


# model
lr = LogisticRegression()
lr.fit(X, y)

where y is a np.ndarrayof 0s and 1s.

However, I receive the following error:

File "C:\Anaconda3\lib\site-packages\sklearn\linear_model\logistic.py", line >1174, in fit
  check_classification_targets(y)
File "C:\Anaconda3\lib\site-packages\sklearn\utils\multiclass.py", line 172, >in check_classification_targets
  raise ValueError("Unknown label type: %r" % y_type)

ValueError: Unknown label type: 'unknown'

From sklearn documentation, I see that

y : array-like, shape (n_samples,)
Target values (class labels in classification, real numbers in regression)

What is my error?

FYI, y is np.array([0.0, 1.0, 1.0, ..., 0.0, 1.0, 0.0], dtype=object) whose size is (891,).


Solution

  • Your y is of type object, so sklearn cannot recognize its type. Add the line y=y.astype('int') right after the line y = train[:, 1].