I have a mobile price classification dataset in which I have 20 features and one target variable called price_range. I need to classify mobile prices as low, medium, high, very high. I have applied a one-hot encoding to my target variable. After that, I split the data into trainX, testX, trainy, testy. So my shape for trainX and trainy is (1600,20) and (1600,4) respectively. Now when I try to fit trainX and trainy to logisticRegresion, i.e -> lr.fit(trainX,trainy) I am getting an error and it says: bad input (1600,4) So, I understood that I have to give trainy value in shape (1600,1) but by one hot encoding I have got array of 4 columns for each individual price_range as per the concept of one hot encoding.
So now I am totally confused how people use one hot encoding for target variable in practice? please help me out.
To train the model, you should only apply OneHotEncoder on features to gain X. And apply LabelEncoder() to convert y.
from sklearn import preprocessing
le=preprocessing.LabelEncoder()
le.fit_transform(['a','b','a'])
and gain:
output: array([0, 1, 0])