python pandas scikit-learn one-hot-encoding

OneHotEncoder : ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

from sklearn.preprocessing import OneHotEncoder

df.LotFrontage = df.LotFrontage.fillna(value = 0)
categorical_mask = (df.dtypes == "object")
categorical_columns = df.columns[categorical_mask].tolist()
ohe = OneHotEncoder(categories = categorical_mask, sparse = False)
df_encoded = ohe.fit_transform(df)
print(df_encoded[:5, :])

ERROR:

May I know whats wrong with my code?

This is a snippet of the data:

[ df.head]() 2

Solution

The categories argument in the OneHotEncoder is not there to select which features to encode, for that you need a ColumnTransformer. Try this:

df.LotFrontage = df.LotFrontage.fillna(value = 0)
categorical_features = df.select_dtypes("object").columns

column_trans = ColumnTransformer(
    [
        ("onehot_categorical", OneHotEncoder(), categorical_features),
    ],
    remainder="passthrough",  # or drop if you don't want the non-categoricals at all...
)
df_encoded = column_trans.fit_transform(df)

Note that according to the docs, the categories argument is

categories‘auto’ or a list of array-like, default=’auto’
Categories (unique values) per feature:

    ‘auto’ : Determine categories automatically from the training data.

    list : categories[i] holds the categories expected in the ith column. The passed categories should not mix strings and numeric
values within a single feature, and should be sorted in case of numeric values.

So it should hold every possible category or level of each of the categorical features. You might use this is you know the full possible set of levels but suspect your training data might omit some. In your case, I don't think you;'ll need it so 'auto', i.e. the default, should be fine.