After using mice
to create 50 imputations of my dataset, I am keen to use the package glmnet
to run an elastic net. I understand that the appropriate way to analyse imputed data is by applying the with
and pool
functions to the mids
object created when mice(x,...)
is run, but glmnet
requires its data to be fed as a matrix. Both model.matrix
and build.x
can be used to convert a generic data frame to a matrix. The mids
object can be converted to a data.frame; however, using the available data as a single dataset would appear to undermine the whole imputation process.
Example:
df <- mice::nhanes
imp <- mice(df) #impute data
com <- complete(imp, "long", TRUE) #creates data frame
mat <- build.x(bmi ~ age + hyp + chl, com, contrasts = FALSE)
Assuming the imputations are accurate, what is the most appropriate way to preserve the imputations and create the relevant matrices for use in glmnet
?
The easiest way to do this is to use my glmnetUtils package, which implements a formula/data frame interface for glmnet. Then fit your elastic net like with any other R model-building function.
install.packages("glmnetUtils")
library(glmnetUtils)
# ... do whatever is required to create an analysis data frame ...
glmnet(bmi ~ age + hyp + chl, data=com)