machine-learning classification decision-tree data-analysis missing-data

Treat missing value as it is in Decision Tree

I have a dataset in which some variable (categorical variable and numerical variable) has missing values. Example, i have a variable "area" with numerical value which divided into two categories, "area (today)" and "area (-1 day)". If a data row categorized as "new comer" then it will have no value on "area (-1 day)". So, normal missing value handling like removal or mean not working here. Do i have to label no value on "area (-1 day)" as a category where the variable is originally numeric? Or, is there any other suggestions?

Solution

Treating the newcomer as a separate class makes sense, because that's how you are treating it in your dataset - you have a separate area column for it.

Otherwise you can check various other Imputation techniques to suit your use case. Regression imputation might suit your case.

HTH

Open Source Neural Network Library
Random forest is worse than linear regression. Is it normal and what is the reason?
Detectron2 - Extract region features at a threshold for object detection
Detectron2 Checkpoint not found
How to process requests from multiiple users using ML model and FastAPI?
Alternative to device_map = "auto" in Huggingface Pretrained
np.where: "ValueError: operands could not be broadcast together with shapes (38658637,) (9456,)"
How to compute number of weights of CNN?
How to find the connected instances from a minimum spanning trees model in R
Can a neural network be trained while it changes in size?
Keras-rl2 error Compability with Tensorflow
Separate a ingredients/feature into separate columns that is marked with "0" or "1"
How to conditionally assign values to tensor [masking for loss function]?
Uniformity of color and texture in image
What is the role of "Flatten" in Keras?
ClassifierChain with Random Forest: Why is np.nan not supported even though Base Estimator handles it?
Machine learning not predicting correct results
Calculate the Cumulative Distribution Function (CDF) in Python
Am I implementing my perceptron with backpropagation correctly?
Issue setting up SciKeras model
Custom model aggregator TensorFlow Federated
Should the data in batch be balanced?
Multi Step Prediction Neural Networks
Train and test splits by unique dates, not observations
Human segmentation fails with Pytorch, not with Tensorflow Keras
Keras multioutput custom loss with intermediate layers output
Isolation Forest Sklearn for 1D array or list and how to tune hyper parameters
Query padding mask and key padding mask in Transformer encoder
Masking and computing loss for a padded batch sent through an RNN with a linear output layer in pytorch
Why does nn.Linear(in_features, out_features) use a weight matrix of shape (out_features, in_features) in PyTorch?