Search code examples
rlistaveragepredictionggeffects

Average of multiple ggpredict objects


I have generated 10 model summaries after fitting the same model on 10 different subsets of the dataset which is as follows

library(mice)
data("nhanes")
head(nhanes)

imp <- mice(nhanes, print = FALSE, m = 10, seed = 24415)
df <- complete(imp, "long")
model_fit <- lapply(1:10, function(i) {
  model = lm(bmi ~ age + hyp + chl,
             data = subset(df, `.imp`==i))
})

From this I get different ggpredict objects

ggpredict(model_fit[[1]], c("age", "hyp"))
ggpredict(model_fit[[2]], c("age", "hyp"))
ggpredict(model_fit[[3]], c("age", "hyp"))
ggpredict(model_fit[[4]], c("age", "hyp"))
ggpredict(model_fit[[5]], c("age", "hyp"))
ggpredict(model_fit[[6]], c("age", "hyp"))
ggpredict(model_fit[[7]], c("age", "hyp"))
ggpredict(model_fit[[8]], c("age", "hyp"))
ggpredict(model_fit[[9]], c("age", "hyp"))
ggpredict(model_fit[[10]], c("age", "hyp"))

I am looking for an efficient way to a) Estimate the average of all the ggpredict objects by hp and age

The expected output would look like this.

  age    hp      Predicted                        95% C.I
  --------------------------------------------------------
  1      1       (28.38 + 29.35 + 27.3...)/10      (26.67 +2 6.83 + 25.25...)/10 ; (30.08 + 31.87 + 29.35....)/10 
  2      1       (24.21 + 26.01 + 25.40...)/10     (22.71 + 23.56 + 23.54...)/10 ;  (25.71 + 28.46 + 27.26....)/10
  3      1       (20.05 + 22.67 + 23.51...)/10     (17.13 + 17.89 + 20.08..)/10 ;   (22.96 + 27.44 + 26.94...)/10

  1      2       (31.82 + 29.35 +28.87...)/10    (28.07 + 23.58 + 24.24...)/10 ;  (35.58 + 35.13 + 33.49....)/10 
  2      2       (27.66 + 26.01 +26.97...)/10    (24.88 + 22.02 + 23.43...)/10 ;  (30.43 + 30.00 + 30.52....)/10  
  3      2       (23.49 + 22.67 +25.08...)/10    (20.63 + 18.73 + 21.50...)/10 ;  (26.35 + 26.61 + 28.66....)/10     

b) Plot based on the final averaged values using ggplot function. enter image description here

So far I tried storing the results from each ggpredict function as list object and

`Reduce(`+`, list_ggpred)/length(list_ggpred)`

I got warning,

 " In Ops.factor(left, right) : `+1 not meaningful for factors.

Any suggestions highly appreciated. Thanks.


Solution

  • Simply use pool_predictions():

    library(ggeffects)
    # example for multiple imputed datasets
    data("nhanes2", package = "mice")
    imp <- mice::mice(nhanes2, printFlag = FALSE)
    predictions <- lapply(1:5, function(i) {
      m <- lm(bmi ~ age + hyp + chl, data = mice::complete(imp, action = i))
      predict_response(m, "age")
    })
    pool_predictions(predictions)
    #> # Predicted values of bmi
    #> 
    #> age   | Predicted |       95% CI
    #> --------------------------------
    #> 20-39 |     30.09 | 28.17, 32.02
    #> 40-59 |     24.60 | 21.48, 27.72
    #> 60-99 |     21.75 | 18.24, 25.26
    #> 
    #> Adjusted for:
    #> * hyp =     no
    #> * chl = 196.32
    
    # and:
    # pool_predictions(predictions) |> plot()
    

    Created on 2024-04-30 with reprex v2.1.0

    Note that pool_predictions() account for the missing values / multiple imputations and adjusts the standard error, thus you get slightly larger confidence intervals than just taking the mean.