Search code examples
pythonmachine-learningone-hot-encoding

I want to use OneHotEncoder in Single Categorical column


Here shape of df is (190,2) where 1st column is x and is a categorical value and @nd column is Integer.

X = df.iloc[:,0].values
y = df.iloc[:,-1].values

# Encoding categorical data

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder = LabelEncoder()
X = labelencoder.fit_transform(X)
X.reshape(-1,1)
onehotencoder = OneHotEncoder(categories = [0])
X = onehotencoder.fit_transform(X).toarray()

Here I wanted to change the Categorical value X using OneHotEncoder to predict y. But When I run this code, I am getting an error.

ValueError: bad input shape ()

Can someone help me to resolve this issue. Thanks


Solution

  • Currently OneHotEncoder does not require for the input features to be numerical. So you can just feed it directly the categorical features:

    onehotencoder = OneHotEncoder()
    X_oh = onehotencoder.fit_transform(X).toarray()
    

    In the case of having a 1D array, as is usually the case of y, you'll need to reshape the array into a 2D one:

    onehotencoder = OneHotEncoder()
    X_oh = onehotencoder.fit_transform(X.reshape(-1,1)).toarray()
    

    Do note however that the following:

    X.reshape(-1,1)
    

    Is not doing anything. It is not performing an in-place operation. You have to assign it back to a variable.