Search code examples
pythonmachine-learningclassificationone-hot-encoding

How to use one hot encoding for multiple label(trainy) in .fit() method?


I have a mobile price classification dataset in which I have 20 features and one target variable called price_range. I need to classify mobile prices as low, medium, high, very high. I have applied a one-hot encoding to my target variable. After that, I split the data into trainX, testX, trainy, testy. So my shape for trainX and trainy is (1600,20) and (1600,4) respectively. Now when I try to fit trainX and trainy to logisticRegresion, i.e -> lr.fit(trainX,trainy) I am getting an error and it says: bad input (1600,4) So, I understood that I have to give trainy value in shape (1600,1) but by one hot encoding I have got array of 4 columns for each individual price_range as per the concept of one hot encoding.

So now I am totally confused how people use one hot encoding for target variable in practice? please help me out.


Solution

  • To train the model, you should only apply OneHotEncoder on features to gain X. And apply LabelEncoder() to convert y.

    from sklearn import preprocessing
    le=preprocessing.LabelEncoder()
    le.fit_transform(['a','b','a'])
    

    and gain:

    output: array([0, 1, 0])