I'm trying to fill NaN categorical values using CategoricalImputer
from sklearn_pandas
.
from sklearn_pandas import CategoricalImputer
imputer = CategoricalImputer()
nan_columns = train_df.loc[:, train_df.isnull().any()]
for column in nan_columns:
imputer.fit_transform(column)
But imputer.fit_transform(column)
gives me this error:
AttributeError: 'str' object has no attribute 'copy'
I'm doing this following the documentation. Where am I going wrong?
Edit:
I added this cell:
from sklearn.impute import SimpleImputer
nan_columns = train_df.loc[:, train_df.isnull().any()]
imputer = SimpleImputer(strategy="most_frequent")
imputer.fit_transform(train_df)
msno.bar(train_df.sample(1000), labels=True, fontsize=8)
However, it didn't work. This is the bar graph showing that there are still missing values in the columns:
You can use SimpleImputer
from scikit-learn with categorical values by using `strategy="most_frequent".
imp = SimpleImputer(strategy="most_frequent")
df = pd.DataFrame({"x": ["a", "a", np.nan],
"y": ["c", np.nan, "c"],
"z": ["a", np.nan, np.nan]})
print(df)
df[:] = imp.fit_transform(df)
print(df)
yields
x y z
0 a c a
1 a NaN NaN
2 NaN c NaN
x y z
0 a c a
1 a c a
2 a c a
If you only want to use it on string or categorical columns:
for col, tp in df.dtypes.items():
if tp == object or tp.name == "category":
df[col] = imp.fit_transform(df[[col]])