Search code examples
pythonscikit-learnnanlogistic-regressiondata-cleaning

Deal with NAN values when creating models with python


I have to explain a variable(Y) by a variable matrix (X).. I have NAN values in my Y which is THE VARIABLE TO EXPLAIN. My NAN values takes half of my observations. Should I delete lines of my Y that are NAN??

X: (int matrix) cleaned and doens't contain NAN values. Y: takes (YES, NO, NAN). How can i do

Thank YoU!!


Solution

  • To answer this question I think you need to answer the more fundamental question "is Y=NaN a label?"

    If you want the trained model to predict one of three labels (Yes, No, NaN) then fill the NaNs with a label (e.g. "Missing"). If you don't (Yes or No) then I don't see how you can train on observations with no target. Drop those and train on the rest.