Search code examples
pythonmachine-learningscikit-learnrandom-forestlightgbm

Feature importance with LightGBM


I have trained a model using several algorithms, including Random Forest from skicit-learn and LightGBM. and these model performs similarly in term of accuracy and other stats.

The issue is the inconsistent behavior between these two algorithms in terms of feature importance. I used default parameters and I know that they are using different method for calculating the feature importance but I suppose the highly correlated features should always have the most influence to the model's prediction. Random Forest makes more sense to me because the highly correlated features appear at top while it is not the case for LightGBM.

Is there a way to explain for this behavior and does this result with LightGBM is trustworthy to be presented?

Random Forest feature importance

enter image description here

LightGBM feature importance

enter image description here

Correlation with target

enter image description here


Solution

  • I have had a similar issue. The default feature importance for LGBM is based on 'split', and when I changed this to 'gain', the plots gave similar results.