Search code examples
rimputationr-mice

Pooling counts for a categorical variable ater MI?


I am doing a MI on a categorical variable with MICE for descriptive statistics(counts(proportion) in each level).

How can i get the pooled standard error for the proportions in each level? could this be done with pool.scalar?

What i have done:

##
data1<-nhanes2

## MI with mice
imp.data <- mice (data = data1, m = 5, maxit = 10, seed = 12345, method = "cart")

## to get all the imputed data sets into one
data2<-complete(imp.data, "long")

## get the counts for each level
counts<-count(data2$hyp)

### Average for all imputed data sets m=5

counts$n<-counts$freq/5

Solution

  • First, I converted hyp to 0s and 1s instead of "yes" and "no". Then I calculated the proportion per group using prop.table and prop.test from this other SO answer, and then I used this RStudio thread to calculate the standard error. Finally, I followed the pooling rules from Heymans and Eekhout (2019).

    library(mice)
    library(dplyr)
    set.seed(12345)
    
    data1 <- nhanes2 %>% mutate(hyp = ifelse(hyp == "no", 0, 1))
    imp.data <- mice (data = data1, m = 5, maxit = 10, seed = 12345, method = "cart", printFlag = FALSE)
    data2 <- complete(imp.data, "long")
    
    pooled_vals <- with(data2, by(data2, .imp, function(x) 
      c(
      prop.table(table(x$hyp == 1)), # Proportions
      sqrt( (prop.test(table(x$hyp == 1))$estimate ) *(1 - (prop.test(table(x$hyp == 1))$estimate ) / length(x$hyp == 1) )), # SE of hyp being yes
      sqrt( (prop.test(table(x$hyp == 0))$estimate ) *(1 - (prop.test(table(x$hyp == 0))$estimate ) / length(x$hyp == 1) )) # SE of hyp being no
      )))
    
    Reduce("+", pooled_vals)/length(pooled_vals)
         FALSE      TRUE         p         p 
    0.7840000 0.2160000 0.8708825 0.4590429