Using two different methods in XGBOOST feature importance, gives me two different most important features, which one should be believed?
Which method should be used when? I am confused.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import xgboost as xgb
df = sns.load_dataset('mpg')
df = df.drop(['name','origin'],axis=1)
X = df.iloc[:,1:]
y = df.iloc[:,0]
# fit the model
model_xgb_numpy = xgb.XGBRegressor(n_jobs=-1,objective='reg:squarederror')
model_xgb_numpy.fit(X.to_numpy(), y.to_numpy())
plt.bar(range(len(model_xgb_numpy.feature_importances_)), model_xgb_numpy.feature_importances_)
# fit the model
model_xgb_pandas = xgb.XGBRegressor(n_jobs=-1,objective='reg:squarederror')
model_xgb_pandas.fit(X, y)
axsub = xgb.plot_importance(model_xgb_pandas)
Numpy method shows 0th feature cylinder is most important. Pandas method shows model year is most important. Which one is the CORRECT most important feature?
It is hard to define THE correct feature importance measure. Each has pros and cons. It is a wide topic with no golden rule as of now and I personally would suggest to read this online book by Christoph Molnar: https://christophm.github.io/interpretable-ml-book/. The book has an excellent overview of different measures and different algorithms.
As a rule of thumb, if you can not use an external package, i would choose gain
, as it is more representative of what one is interested in (one typically is not interested in raw occurrence of splits on a particular features, but rather how much those splits helped), see this question for a good summary: https://datascience.stackexchange.com/q/12318/53060. If you can use other tools, shap exhibits very good behaviour and I would always choose it over build-in xgb tree measures, unless computation time is strongly constrained.
As for the difference that you directly pointed at in your question, the root of the difference comes from the fact that xgb.plot_importance
uses weight
as the default extracted feature importance type, while the XGBModel
itself uses gain
as the default type. If you configure them to use the same importance type, then you will get similar distributions (up to additional normalisation in feature_importance_
and sorting in plot_importance
).