Search code examples
pythonpandasscikit-learncategorical-dataone-hot-encoding

How can I fix type error for One hot encoder


My problem is that I need to change some sets of categorized columns into numbers for machine learning. I don't want to use LabelEncoding because I heard it's not as efficient as OnehotEncoder.

So i used this code

X = df.drop("SalePrice", axis=1)
y = df['SalePrice']
one_hot = OneHotEncoder()
transformer = ColumnTransformer([("one_hot", one_hot,categorical_features)], remainder="passthrough")
transformed_X = transformer.fit_transform(df)

Where the categorical features are the list of columns i want to use the onehotencoder on

But I get a multiple line error as an output with the overall problem stating:

TypeError: Encoders require their input to be uniformly strings or numbers. Got ['float', 'str']

Someone has had similar issues and was asked to clean his data to remove nan values and i have done that already but no change. I have also been asked to change the datatypes of my colums to strings and i wrote a loop to do that like here:

enter image description here


Solution

  • This error is pretty self-explainatory : you cannot have str AND float in your columnS to use the encoder.

    Where the categorical features are the list of columns i want to use the onehotencoder on

    Make sure that all your columns share the same type too.

    You can try to do this in order to force everything to be a string

    for e in categorical_features:
        df[e]=df[e].astype(str)
    

    or maybe you have another issue with your data if everything 'should' be float. In this case use things like isnumeric