I have gone through replace missing values in categorical data regarding handling missing values in categorical data.
Dataset has about 6 categorical columns
with missing values
. This would be for a binary classification problem
I see different approaches where one is to just leave the missing values in category column as such
, other to impute using from sklearn.preprocessing import Imputer
, but unsure which is better option.
In case if imputing
is better option, which libraries could I use before applying the model like LR,Decision Tree, RandomForest
.
Thanks!
There are multiple ways to handle missing data :
More details on values imputing in sklearn : https://scikit-learn.org/stable/modules/impute.html