Search code examples
pythonscikit-learnxgboost

XGBoost get predict_contrib using sklearn API?


In Python, XGBoost allows you to train/predict using their Booster class or using their sklearn API (http://xgboost.readthedocs.io/en/latest/python/python_api.html). I'm using the sklearn API, and want to use the pred_contribs capabilities of XGBoost. I would expect this to work, but it doesn't:

model = xgb.XGBClassifier().fit(X_train, y_train)
pred = model.predict_proba(X_test, pred_contribs=True)

It looks like pred_contribs is only a parameter for the Booster class predict function. How do I use this parameter through the sklearn API? Or is there an easy workaround to get the prediction contributors after training using the sklearn API?


Solution

  • You can use the get_booster() method from XGBClassifier, which will return a Booster object, after the XGBClassifier has been fitted with training data.

    After that you can simply call predict() on the Booster object with pred_contribs = True.

    Example code:

    from xgboost import XGBClassifier, DMatrix
    from sklearn.datasets import load_iris
    
    iris_data = load_iris()
    
    # Taking only first 100 samples to make this a binary problem, 
    # else it will be multi-class and shape of pred_contribs will change
    X, y = iris_data.data[:100], iris_data.target[:100]
    
    # This data has 4 features
    print(X.shape)
    Output: (100, 4)
    
    
    clf = XGBClassifier()
    clf.fit(X, y)
    
    # This is what you need
    booster = clf.get_booster()
    
    
    # Using only a single sample for predict, you can use multiple
    test_X = [X[0]]
    
    # Wrapping the test X into a DMatrix, need by Booster
    predictions = booster.predict(DMatrix(test_X), pred_contribs=True)
    
    print(predictions.shape)
    
    # Output has 5 columns, 1 for each feature, and last for bias
    Output: (1, 5)