Search code examples
pythoncategoriesdecision-treecategorical-datalightgbm

LightGBM: Are negative values (ie. missing values) in categorical features treated as a separate category?


Based on LightGBM's documentation in the link below, the parameter categorical_feature (for categorical features) states that "All negative values in categorical features will be treated as missing values."

https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier

My question is: Are the negative values (ie. missing values) in categorical features treated as a separate category? Or are they just treated as missing values and are not included as a category in the model?

Many thanks in advance.


Solution

  • Either way :) the nans will be grouped in a way that minimizes error, not discarded