Search code examples
machine-learningclassificationdecision-treedata-analysismissing-data

Treat missing value as it is in Decision Tree


I have a dataset in which some variable (categorical variable and numerical variable) has missing values. Example, i have a variable "area" with numerical value which divided into two categories, "area (today)" and "area (-1 day)". If a data row categorized as "new comer" then it will have no value on "area (-1 day)". So, normal missing value handling like removal or mean not working here. Do i have to label no value on "area (-1 day)" as a category where the variable is originally numeric? Or, is there any other suggestions?


Solution

  • Treating the newcomer as a separate class makes sense, because that's how you are treating it in your dataset - you have a separate area column for it.

    Otherwise you can check various other Imputation techniques to suit your use case. Regression imputation might suit your case.

    HTH