Search code examples
lightgbm

Lightgbm, Force use of all features


I am playing a bit with lightgbm to classify some binary files, after a lot of search I am still unable to find a way to force lightgbm to use all data I provide from my dataset. When the training starts it says I have 83390 data (files in my case), but only 5XXX features used. I tried to change some parameters like "min_data_in_leaf" but it doesnt really change.

Can someone explain me how to tune Lightgbm to make it use all data I have ?

2019-02-16 17:02:03,969 Train model
[LightGBM] [Warning] Starting from the 2.1.2 version, default value for the "boost_from_average" parameter in "binary" objective is true.
This may cause significantly different results comparing to the previous versions of LightGBM.
Try to set boost_from_average=false, if your old models produce bad results
[LightGBM] [Info] Number of positive: 41695, number of negative: 41695
[LightGBM] [Info] Total Bins 494351
[LightGBM] [Info] Number of data: 83390, number of used features: 5937
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000

Solution

  • LightGBM will auto disable the feature that cannot be splitted, like the feature with almost all values are zeros (or the same).