Search code examples
rmachine-learningxgboostshap

Customizing labels in SHAPforxgboost plots


I'm creating some plots of SHAP-scores for visualizing a model I created with xgboost. I've used the SHAPforxgboost package which has worked very well, and I now want to use the figures (especially the one from shap.plot.summary()) in a text document I'm writing. However, the font sizes of the labels/titles on the x and y-axes are very small and I was wondering if there was a way I could make these larger and more readable.

I've used a very similar setup as shown here; https://www.rdocumentation.org/packages/SHAPforxgboost/versions/0.0.2 :

library("SHAPforxgboost")
y_var <-  "diffcwv"
dataX <- dataXY_df[,-..y_var]
# hyperparameter tuning results
param_dart <- list(objective = "reg:linear",  # For regression
                   nrounds = 366,
                   eta = 0.018,
                   max_depth = 10,
                   gamma = 0.009,
                   subsample = 0.98,
                   colsample_bytree = 0.86)

mod <- xgboost::xgboost(data = as.matrix(dataX), label = as.matrix(dataXY_df[[y_var]]), 
                       xgb_param = param_dart, nrounds = param_dart$nrounds,
                       verbose = FALSE, nthread = parallel::detectCores() - 2,
                       early_stopping_rounds = 8)

# To return the SHAP values and ranked features by mean|SHAP|
shap_values <- shap.values(xgb_model = mod, X_train = dataX)
# The ranked features by mean |SHAP|
shap_values$mean_shap_score

# To prepare the long-format data:
shap_long <- shap.prep(xgb_model = mod, X_train = dataX)
# is the same as: using given shap_contrib
shap_long <- shap.prep(shap_contrib = shap_values$shap_score, X_train = dataX)
# (Notice that there will be a data.table warning from `melt.data.table` due to `dayint` coerced from integer to double)

# **SHAP summary plot**
shap.plot.summary(shap_long)

The output of shap.plot.summary() is: something like this

More specifically, I would be interested in increasing the font size of each descriptor on the y-axis


Solution

  • Looking at the code here since it is made with ggplot you should be able to overwrite default label size argument.

    Using the example of the shap.plot.summary.wrap2 function :

    library("SHAPforxgboost")
    library("ggplot2")
    
    data("iris")
    X1 = as.matrix(iris[,-5])
    mod1 = xgboost::xgboost(
            data = X1, label = iris$Species, gamma = 0, eta = 1,
            lambda = 0,nrounds = 1, verbose = FALSE)
    
    
    # shap.values(model, X_dataset) returns the SHAP
    # data matrix and ranked features by mean|SHAP|
    shap_values <- shap.values(xgb_model = mod1, X_train = X1)
    shap_values$mean_shap_score
    #> Petal.Length  Petal.Width Sepal.Length  Sepal.Width 
    #>   0.62935975   0.21664035   0.02910357   0.00000000
    shap_values_iris <- shap_values$shap_score
    
    # shap.prep() returns the long-format SHAP data from either model or
    shap_long_iris <- shap.prep(xgb_model = mod1, X_train = X1)
    # is the same as: using given shap_contrib
    shap_long_iris <- shap.prep(shap_contrib = shap_values_iris, X_train = X1)
    
    # **SHAP summary plot**
    # shap.plot.summary(shap_long_iris, scientific = TRUE)
    # shap.plot.summary(shap_long_iris, x_bound  = 1.5, dilute = 10)
    
    # Alternatives options to make the same plot:
    # option 1: from the xgboost model
    # shap.plot.summary.wrap1(mod1, X = as.matrix(iris[,-5]), top_n = 3)
    
    # option 2: supply a self-made SHAP values dataset
    # (e.g. sometimes as output from cross-validation)
    shap.plot.summary.wrap2(shap_values_iris, X1, top_n = 3) +
            ggplot2::theme(axis.text.y = element_text(size = 20))