Difference between feature_importances_ and feature_importance() in lightgbm

There are two types of feature importance in LightGBM, namely feature_importance() for lightgbm.Booster and feature_importances_ for lightgbm.LGBMClassifier. feature_importance() contains two types of feature importance, "split" and "gain". My main question is: which one is used for feature_importances_ for Scikit-learn API? "split", "gain" or "mean decrease in impurity" as defaulted in scikit-learn?

Solution

By default, the .feature_importances_ property on a fitted lightgbm.sklearn estimator uses the "split" importance type.

As described in LightGBM's docs (link), the estimators from lightgbm.sklearn take a keyword argument importance_type which controls what type of importance is returned by the feature_importances_ property.

importance_type (str, optional (default='split')).

The type of feature importance to be filled into feature_importances_. If ‘split’, result contains numbers of times the feature is used in a model. If ‘gain’, result contains total gains of splits which use the feature.

Here's an example using lightgbm==4.1.0 and Python 3.11.

import lightgbm as lgb
from sklearn.datasets import make_blobs

# generate data
X, y = make_blobs(
    n_samples=1_000,
    n_features=4,
    centers=2
)

# train a model
clf = lgb.LGBMClassifier(
    n_estimators=10,
).fit(X,y)

# .feature_importances defaults to "split"
clf.feature_importances_
# array([21,  1,  4,  0], dtype=int32)

# if you set importance_type to "gain", it'll be the
# cumulative gain from all splits involving that feature
clf.importance_type = "gain"
clf.feature_importances_
# array([5.21306300e+03, 3.55271008e-15, 1.27897657e-13, 0.00000000e+00])