Search code examples
pythonmachine-learningscikit-learnone-hot-encoding

How do I onehotencode a single column in a dataframe?


I have a dataframe called "vehicles" with 8 columns. 7 are numerical but the column named 'Car_name' which is index 1 in the dataframe and is categorical. i need to encode it

i tried this code and wont work

ohe = OneHotEncoding(categorical_features = [1])

vehicles_enc = ohe.fit_transform(vehicles).toarray()

TypeError: OneHotEncoder.__init__() got an unexpected keyword argument 'categorical_features'

this however works perfectly in a youtube vid i used.


Solution

  • It seems you are using a newer version of scikit-learn, which is most likely to be different from the video you are watching. The categorical_features argument is not valid for OneHotEncoder.

    You can try the ColumnTransformer or directly specify which columns to encode.. something like this:

    from sklearn.preprocessing import OneHotEncoder
    from sklearn.compose import ColumnTransformer
    
    ohe = OneHotEncoder()
    
    column_transformer = ColumnTransformer(
        transformers=[
            ('ohe', ohe, [1])  # Index of 'Car_name' column
        ],
        remainder='passthrough'  # Keep the other columns as they are
    )
    
    vehicles_enc = column_transformer.fit_transform(vehicles).toarray()
    

    And as an extra recommendation, always check the versions of the libraries that are used in tutorials, and make sure to check the official documentation of the libraries you are using: