Search code examples
xgboost

How to get each individual tree's prediction in xgboost?


Using xgboost.Booster.predict can only get the prediction result of all the tree or the predicted leaf of each tree. But how could I get the prediction value of each tree?


Solution

  • As of recently, xgboost has introduced a slicing API, and Raul's answer, while valid, is overly complicated.

    To get individual predictions all you need is to iterate through the booster object.

    individual_preds = []
    for tree_ in model.get_booster():
        individual_preds.append(
            tree_.predict(xgb.DMatrix(X))
        )
    

    Note however, that those individual predictions are not individual contributions. E.g. summing them up will not get the final prediction. For that we need to transform them back into log-odds and then sum up:

    from scipy.special import expit as sigmoid, logit as inverse_sigmoid
    individual_preds = np.vstack(individual_preds)
    indivudual_logits = inverse_sigmoid(individual_preds)
    final_logits = indivudual_logits.sum(axis=0)
    final_preds = sigmoid(final_logits)
    

    Fully reproducible example, replicating Raul's efforts

    import numpy as np
    import xgboost as xgb
    from sklearn import datasets
    from scipy.special import expit as sigmoid, logit as inverse_sigmoid
    
    # Load data
    iris = datasets.load_iris()
    X, y = iris.data, (iris.target == 1).astype(int)
    
    # Fit a model
    model = xgb.XGBClassifier(
        n_estimators=10,
        max_depth=10,
        use_label_encoder=False,
        objective='binary:logistic'
    )
    model.fit(X, y)
    booster_ = model.get_booster()
    
    # Extract indivudual predictions
    individual_preds = []
    for tree_ in booster_:
        individual_preds.append(
            tree_.predict(xgb.DMatrix(X))
        )
    individual_preds = np.vstack(individual_preds)
    
    # Aggregated individual predictions to final predictions
    indivudual_logits = inverse_sigmoid(individual_preds)
    final_logits = indivudual_logits.sum(axis=0)
    final_preds = sigmoid(final_logits)
    
    # Verify correctness
    xgb_preds = booster_.predict(xgb.DMatrix(X))
    np.testing.assert_almost_equal(final_preds, xgb_preds)