Search code examples
pythonmachine-learningmissing-dataxgboostimputation

How can I save filled missing data after using XGBClassifier?


I have a dataset which has missing values in it, however it is not a problem for XGBClassifier. It can dynamically fill the value for you. I want to save the features as XGBClassifier fill them. My aim is to use XGBoost to impute missing data, then I will try another algorithms which don't allow NaN values. Is this possible ?


Solution

  • XGBoost can handle missing values, but it does not fill them. So the answer is no, you cannot use it to some how populate missing values in a feature.

    On training time, the way it handles missing data is by choosing the direction that will minimise the loss at each split. So all the process that is involved in the handling of missing data is in selecting the optimal path based on how much the loss function is minimized, but there is no value imputation involved.

    This is mentioned in the publication:

    The optimal default directions are learnt from the data. The key improvement is to only visit the non-missing entries Ik. The presented algorithm treats the non-presence as a missing value and learns the best direction to handle missing values