Search code examples
rimputationaucr-mice

AUC of logistic and ordinal model following multiple imputation using MICE (with R)


I am asking a question concerning the additive predictive benefit of the inclusion of a variable to a logistic and an ordinal model. I am using mice to impute missing covariates and am having difficulty finding ways to calculate the AUC and R squared of the pooled imputed models. Does anyone have any advice?

The summary readout only provides the term, estimate, std.error, statistic, df , p.value

Example code:

imputed_Data <- mice(Cross_sectional, m=10, predictorMatrix=predM, seed=500, method = meth)
Imputedreferecemodel <- with(imputed_Data, glm(Poor ~ age + sex + education + illness + injurycause, family = "binomial", na.action=na.omit) )
summary(pool(Imputedreferecemodel))

Many thanks.


Solution

  • You could use the psfmi package in combination with mice. You could use the function pool_performance to measure performances for logistic regression, according to documentation:

    pool_performance Pooling performance measures for logistic and Cox regression models.

    I use the nhanes dataset which is standard in mice to show you a reproducible example.

    # install.packages("devtools")
    # devtools::install_github("mwheymans/psfmi") # for installing package
    library(psfmi)
    library(mice)
    
    # Make reproducible data with 0 and 1 outcome variable
    set.seed(123)
    nhanes$hyp <- ifelse(nhanes$hyp==1,0,1)
    nhanes$hyp <- as.factor(nhanes$hyp)
    
    # Mice
    imp <- mice(nhanes, m=5, maxit=5) 
    
    nhanes_comp <- complete(imp, action = "long", include = FALSE)
    
    pool_lr <- psfmi_lr(data=nhanes_comp, nimp=5, impvar=".imp", 
                        formula=hyp ~ bmi, method="D1")
    pool_lr$RR_model
    #> $`Step 1 - no variables removed -`
    #>          term    estimate std.error   statistic       df   p.value        OR
    #> 1 (Intercept) -0.76441322 3.4753113 -0.21995532 16.06120 0.8286773 0.4656071
    #> 2         bmi -0.01262911 0.1302484 -0.09696177 15.79361 0.9239765 0.9874503
    #>      lower.EXP upper.EXP
    #> 1 0.0002947263 735.56349
    #> 2 0.7489846190   1.30184
    
    # Check performance
    pool_performance(pool_lr, data = nhanes_comp, formula = hyp ~ bmi, 
                     nimp=5, impvar=".imp", 
                     cal.plot=TRUE, plot.indiv="mean", 
                     groups_cal=4, model_type="binomial")
    #> Warning: argument plot.indiv is deprecated; please use plot.method instead.
    

    #> $ROC_pooled
    #>                     95% Low C-statistic 95% Up
    #> C-statistic (logit)  0.2731      0.5207 0.7586
    #> 
    #> $coef_pooled
    #> (Intercept)         bmi 
    #> -0.76441322 -0.01262911 
    #> 
    #> $R2_pooled
    #> [1] 0.009631891
    #> 
    #> $Brier_Scaled_pooled
    #> [1] 0.004627443
    #> 
    #> $nimp
    #> [1] 5
    #> 
    #> $HLtest_pooled
    #>        F_value    P(>F) df1      df2
    #> [1,] 0.9405937 0.400953   2 31.90878
    #> 
    #> $model_type
    #> [1] "binomial"
    

    Created on 2022-12-02 with reprex v2.0.2