python-3.x machine-learning scikit-learn logistic-regression xgboost

How to get glm-like odds ratios from xgboost?

I have successfully run a machine learning algorithm usuing xgboost on Python 3.8.5, but am struggling with interpretation of the results.

the output/target is binary, deceased or not deceased.

Both myself and my audience understand odds ratios like what come from R's glm well, and I'm sure that xgboost can display this information somehow, but I can't figure out how.

My first instinct is to look at the output from xgboost's predict_proba

but when I do that, I get

>>> deceased.pp.view()
array([[0.5828363 , 0.4171637 ],
       [0.89795643, 0.10204358],
       [0.5828363 , 0.4171637 ],
       [0.89795643, 0.10204358]], dtype=float32)

I'm assuming that these are the p that would go into the formula 1/(1-p) to calculate an odds ratio for each input term like sex and age.

I found a similar question on this website but the answer doesn't help me:

xgboost predict_proba : How to do the mapping between the probabilities and the labels

so based on the answer there, I use the .classes_ to get this

>>> deceased.xg_clf.classes_
array([False,  True])

In fact, I'm not even sure that xgboost can give glm-like odds ratios, the closest thing seems to be feature_importances.

However, feature importance doesn't give the same information that odds ratios do.

but .classes_ tells me nothing about how to find out which input categories, e.g. age or sex have what probabilities.

How can I link classes_ with the input categories? Or if that is not correct or impossible, how else can I calculate odds ratios for each input variable in xgboost?

Solution

Agreed that it doesn't really fit for XGBoost to provide something like an odds ratio. Have you taken a look at other forms of model interpretability for more complex models like XGBoost? shap, for example, is a library that can provide similar sorts of analysis but is more well-suited for these types of models.