I know this is asked many times, but I can not figure it out.
I have dataset in this format. First 767 columns are for training and have training data. Next 669 columns are labels.
Labels are in the format of one hot vector i.e [0,0,0......1,0,0]. So I have 669 columns. Now I want to perform training on it using XGBoost. My code is.
self.clf = XGBClassifier(objective="multi:softmax", num_classes=669)
data = single_data.iloc[:, 0:767]
label = single_data.iloc[:, 767:]
self.clf.fit(data, label)
The error I get is
ValueError: y should be a 1d array, got an array of shape (1638, 670) instead.
How can I solve this? Thanks
I'm assuming your data looks like this:
import pandas as pd
label = pd.DataFrame({'c0':[0,1,0,0,0], 'c1':[1,0,0,0,0], 'c2':[0,0,1,1,0], 'c3':[0,0,0,0,1]})
print(label)
Output
c0 c1 c2 c3
0 0 1 0 0
1 1 0 0 0
2 0 0 1 0
3 0 0 1 0
4 0 0 0 1
The convert them to integers
label = label.apply(lambda x: x.argmax(), axis=1).values
Now your labels look like this, a single array:
array([1, 0, 2, 2, 3], dtype=int64)