Search code examples
pythonmachine-learningscikit-learnxgboost

ValueError: y should be a 1d array, got an array of shape instead


I know this is asked many times, but I can not figure it out.

I have dataset in this format. First 767 columns are for training and have training data. Next 669 columns are labels.

Labels are in the format of one hot vector i.e [0,0,0......1,0,0]. So I have 669 columns. Now I want to perform training on it using XGBoost. My code is.

self.clf = XGBClassifier(objective="multi:softmax", num_classes=669)
data = single_data.iloc[:, 0:767]
label = single_data.iloc[:, 767:]
self.clf.fit(data, label)

The error I get is

ValueError: y should be a 1d array, got an array of shape (1638, 670) instead.

How can I solve this? Thanks


Solution

  • I'm assuming your data looks like this:

    import pandas as pd
    label = pd.DataFrame({'c0':[0,1,0,0,0], 'c1':[1,0,0,0,0], 'c2':[0,0,1,1,0], 'c3':[0,0,0,0,1]})
    print(label)
    

    Output

       c0  c1  c2  c3
    0   0   1   0   0
    1   1   0   0   0
    2   0   0   1   0
    3   0   0   1   0
    4   0   0   0   1
    

    The convert them to integers

    label = label.apply(lambda x: x.argmax(), axis=1).values
    

    Now your labels look like this, a single array:

    array([1, 0, 2, 2, 3], dtype=int64)