I am doing a MI on a categorical variable with MICE for descriptive statistics(counts(proportion) in each level).
How can i get the pooled standard error for the proportions in each level?
could this be done with pool.scalar
?
What i have done:
##
data1<-nhanes2
## MI with mice
imp.data <- mice (data = data1, m = 5, maxit = 10, seed = 12345, method = "cart")
## to get all the imputed data sets into one
data2<-complete(imp.data, "long")
## get the counts for each level
counts<-count(data2$hyp)
### Average for all imputed data sets m=5
counts$n<-counts$freq/5
First, I converted hyp
to 0s and 1s instead of "yes" and "no". Then I calculated the proportion per group using prop.table
and prop.test
from this other SO answer, and then I used this RStudio thread to calculate the standard error. Finally, I followed the pooling rules from Heymans and Eekhout (2019).
library(mice)
library(dplyr)
set.seed(12345)
data1 <- nhanes2 %>% mutate(hyp = ifelse(hyp == "no", 0, 1))
imp.data <- mice (data = data1, m = 5, maxit = 10, seed = 12345, method = "cart", printFlag = FALSE)
data2 <- complete(imp.data, "long")
pooled_vals <- with(data2, by(data2, .imp, function(x)
c(
prop.table(table(x$hyp == 1)), # Proportions
sqrt( (prop.test(table(x$hyp == 1))$estimate ) *(1 - (prop.test(table(x$hyp == 1))$estimate ) / length(x$hyp == 1) )), # SE of hyp being yes
sqrt( (prop.test(table(x$hyp == 0))$estimate ) *(1 - (prop.test(table(x$hyp == 0))$estimate ) / length(x$hyp == 1) )) # SE of hyp being no
)))
Reduce("+", pooled_vals)/length(pooled_vals)
FALSE TRUE p p
0.7840000 0.2160000 0.8708825 0.4590429