Search code examples

h2o SHAP values / predict_contributions for cross validation

I've looked into the h2o.predict_contributions function that exposes the Shap values from xgb and gbm models. Does this function also provide these metrics from cross validation predictions? I can't seem to find them.


Sonar.h2o = as.h2o(Sonar)

mdl = h2o.xgboost(x=names(Sonar), y='Class', training_frame = Sonar, nfolds=5, keep_cross_validation_predictions = TRUE)


  • yes you can apply the function to a single fold of interest, here is some example code:

    prostate_path <- system.file("extdata", "prostate.csv", package = "h2o")
    prostate <- h2o.uploadFile(path = prostate_path)
    prostate_gbm <- h2o.gbm(3:9, "AGE", prostate, nfolds = 3)
    h2o.predict(prostate_gbm, prostate)
    h2o.predict_contributions(prostate_gbm, prostate)
    # take a look at the output to see which key you want to use
    # there are also other options to key names
    # update this with the key of interest
    key = 'GBM_model_R_1557326910287_7702_cv_2'
    cv2 = h2o.getModel(key)
    h2o.predict_contributions(cv2, prostate)
    # RACE       DPROS      DCAPS         PSA        VOL     GLEASON __internal_cv_weights__ BiasTerm
    # 1 -0.006481315 -0.19211742 -0.0836791 -0.06186131 -0.9217098 -0.20128664                       0 66.37209
    # 2 -0.005238285 -1.09128833  0.9614767 -0.95340544 -0.7698430  0.06820074                       0 66.37209
    # 3 -0.006481315  0.98101193  0.1770813  1.21195042 -1.0359415 -0.23213011                       0 66.37209
    # 4  0.069538474 -0.01738315 -0.2000238  4.11799049  0.1177490 -0.01457024                       0 66.37209
    # 5  0.012923095  0.40362182 -0.1132747  1.21669090  0.9920316 -0.37245926                       0 66.37209
    # 6 -0.002282504 -0.91798097  0.9024866 -0.17398398 -0.6048008  0.42300656                       0 66.37209

    Note: you can ignore the __internal_cv_weights__ column. I've created a ticket to clean up the output that can be tracked here.