Search code examples
pythonmachine-learningxgboostfeature-selectionxgbclassifier

Does xgBoost's relative feature importance vary with datapoints in test set?


I'm working on a binary classification dataset and applying xgBoost model to the problem. Once the model is ready, I plot the feature importance and one of the trees resulting from the underlying random forests. Please find these plots below.

enter image description here enter image description here

Questions

  • If I take a test set of say 10 datapoints, would the importance of features vary from datapoint to datapoint for computation of that datapoints predict_proba score?
  • Taking analogy from CNNs class activation map which varies from datapoint to datapoint, does the ordering and relative importance of each feature remain the same when model runs on multiple datapoints or does it vary?

Solution

  • What do you mean by "datapoint"? Is a datapoint a single case/subject/patient/etc? If so;

    1. The feature importance plot and the tree you plotted both relate only to the model, they are independent of the test set. Finding out which features were important in categorising a specific subject/case/datapoint in the test set is a more challenging task (see e.g. XGBoostExplainer / https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211).

    2. The ordering and relative importance of each feature are different for each subject/case/datapoint (see above), and there is no 'class activation map' in xgboost - all data is analysed and data that is deemed 'not important' does not contribute final decision.

    EDIT

    Further example of XGBoostExplainer: example_1.png